南方科技大学知识苑(SUSTech KC): Strip-MLP: Efficient Token Interaction for Vision MLP

题名	Strip-MLP: Efficient Token Interaction for Vision MLP
作者	Guiping Cao1,2 ; Shengda Luo1 ; Wenjian Huang1 ; Xiangyuan Lan 2; Dongmei Jiang 2; Yaowei Wang 2; Jianguo Zhang1,2
通讯作者	Xiangyuan Lan; Jianguo Zhang
DOI	arxiv-2307.11458
发表日期	2023-10-01
会议名称	IEEE International Conference on Computer Vision 2023
ISSN	1550-5499
ISBN	979-8-3503-0719-1
会议录名称	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
页码	1494-1504
会议日期	2023.10
会议地点	Paris
出版地	10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
出版者	IEEE COMPUTER SOC
摘要	Token interaction operation is one of the core modules in MLP-based models to exchange and aggregate information between different spatial locations. However, the power of token interaction on the spatial dimension is highly depen- (a) Down-sampled image. (b) Token mixing of MLP layer. dent on the spatial resolution of the feature maps, which limits the model's expressive ability, especially in deep layers where the feature are down-sampled to a small spatial size. To address this issue, we present a novel method called Strip-MLP to enrich the token interaction power in three ways. Firstly, we introduce a new MLP paradigm called Strip MLP layer that allows the token to interact with other tokens in a cross-strip manner, enabling the tokens in a row (or column) to contribute to the information aggregations in adjacent but different strips of rows (or columns). Secondly, a Cascade Group Strip Mixing Module (CGSMM) is proposed to overcome the performance degradation caused by small spatial feature size. The module allows tokens to interact more effectively in the manners of within-patch and cross-patch, which is independent to the feature spatial size. Finally, based on the Strip MLP layer, we propose a novel Local Strip Mixing Module (LSMM) to boost the token interaction power in the local region. Extensive experiments demonstrate that Strip-MLP significantly improves the performance of MLP-based models on small datasets and obtains comparable or even better results on ImageNet. In particular, Strip-MLP models achieve higher average Top-1 accuracy than existing MLP-based models by +2.44% on Caltech-101 and +2.16% on CIFAR-100. The source codes will be available at https://github.com/MedProcess/Strip MLP.
关键词	Degradation Strips Computer vision Analytical models Source coding Computational modeling Task analysis
学校署名	第一 ; 通讯
语种	英语
相关链接	[IEEE记录]
收录类别	CPCI-S
资助项目	National Key Research and Development Program of China[2021YFF1200800] ; Peng Cheng Laboratory Research Project[PCL2023AS6-1] ; Guangdong Basic and Applied Basic Research Foundation[2022A1515110573]
WOS研究方向	Computer Science ; Imaging Science & Photographic Technology
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods ; Imaging Science & Photographic Technology
WOS记录号	WOS:001159644301070
来源库	人工提交
全文链接	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10376581
出版状态	正式出版
引用统计	被引频次[WOS]：1
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/646886
专题	南方科技大学工学院_计算机科学与工程系
作者单位	1.Southern University of Science and Technology, Shenzhen, China 2.Peng Cheng Laboratory, Shenzhen, China
第一作者单位	南方科技大学
通讯作者单位	南方科技大学
第一作者的第一单位	南方科技大学
推荐引用方式 GB/T 7714	Guiping Cao,Shengda Luo,Wenjian Huang,et al. Strip-MLP: Efficient Token Interaction for Vision MLP[C]. 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA:IEEE COMPUTER SOC,2023:1494-1504.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
Cao_Strip-MLP_Effici（1976KB）	--	--	限制开放	--