题名 | Multi-Scale Features Are Effective for Multi-Modal Classification: An Architecture Search Viewpoint |
作者 | |
发表日期 | 2024
|
DOI | |
发表期刊 | |
ISSN | 1558-2205
|
卷号 | PP期号:99 |
摘要 | Multi-modal neural architecture search (MNAS) is an effective approach to obtain task-adaptive multi-modal classification models. Deep neural networks, as currently mainstream feature extractors, can provide hierarchical features for each modality. Existing MNAS methods face difficulty in exploiting such hierarchical features due to their different form coexistence such as tensorial multi-scale features and vectorized penultimate features. Moreover, existing methods always focus on the evolution of fusion operators or vectorized features of all modalities, constraining search space. In this paper, a novel two-stage method called multi-modal multi-scale evolutionary neural architecture search (MM-ENAS) is proposed. The first stage unifies the representation form of hierarchical features by the proposed evolutionary statistics strategy. The second stage identifies the optimal combination of basic fusion operations for all unified hierarchical features by the evolutionary algorithm. MM-ENAS increases search space by simultaneously searching for feature statistical extraction methods, basic fusion operators and feature representation set consisting of tensorial multi-scale features and vectorized penultimate features. Experimental results on three multi-modal tasks demonstrate that the proposed method achieves competitive performance in terms of accuracy, search time, and number of parameters compared to existing representative MNAS methods. Additionally, the method exhibits fast adaptation to various multi-modal tasks. |
相关链接 | [IEEE记录] |
学校署名 | 其他
|
引用统计 | |
成果类型 | 期刊论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/840351 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | 1.Institute of Big Data Science and Industry, Shanxi University, Taiyuan, Shanxi, China 2.School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan, Shanxi, China 3.the Department of Computer Science and Engineering, Shenzhen Key Laboratory of Computational Intelligence, Southern University of Science and Technology, Shenzhen, Guangdong, China |
推荐引用方式 GB/T 7714 |
Pinhan Fu,Xinyan Liang,Yuhua Qian,et al. Multi-Scale Features Are Effective for Multi-Modal Classification: An Architecture Search Viewpoint[J]. IEEE Transactions on Circuits and Systems for Video Technology,2024,PP(99).
|
APA |
Pinhan Fu.,Xinyan Liang.,Yuhua Qian.,Qian Guo.,Yayu Zhang.,...&Ke Tang.(2024).Multi-Scale Features Are Effective for Multi-Modal Classification: An Architecture Search Viewpoint.IEEE Transactions on Circuits and Systems for Video Technology,PP(99).
|
MLA |
Pinhan Fu,et al."Multi-Scale Features Are Effective for Multi-Modal Classification: An Architecture Search Viewpoint".IEEE Transactions on Circuits and Systems for Video Technology PP.99(2024).
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论