南方科技大学知识苑(SUSTech KC): Multi-modal recursive prompt learning with mixup embedding for generalization recognition

题名	Multi-modal recursive prompt learning with mixup embedding for generalization recognition
作者	Jia，Yunpeng 1; Ye，Xiufen 1; Liu，Yusong 2; Guo，Shuxiang3
通讯作者	Ye，Xiufen
发表日期	2024-06-21
DOI	10.1016/j.knosys.2024.111726
发表期刊	Knowledge-Based Systems 影响因子和分区
ISSN	0950-7051
卷号	294
摘要	The contrastive language-image pretraining (CLIP) model has shown promise in generalization recognition by combining visual and textual embeddings. However, the pretrained CLIP model requires further fine tuning for downstream tasks. Existing prompt learning (PL) approaches, while effective, neglect the cross-hierarchical fusion of multimodal features. To address this issue, we introduce multi-modal recursive PL (MmRPL) for different generalization image recognition tasks. Recursive connections between prompts at different layers facilitate hierarchy-aware vision-text fusion. To our best knowledge, we introduce a mixup embedding technique to enhance feature representations of PL for the first time. Extensive experiments across popular generalization recognition settings reveal that MmRPL outperforms existing methods. Furthermore, when combined with the embedding technique, MmRPL further enhances recognition in base-to-novel generalization, generalized zero-shot, domain generalization, and domain adaptive zero-shot scenarios. Notably, the proposed methods extend to applications like zero-shot recognition of side-scan sonar (SSS) image targets via domain generalization adaptation. The proposed methods also achieve more than 80% mean accuracy among seafloor, airplane, and ship categories in SSS images.
关键词	Domain adaptation Generalization recognition Mixup technique Prompt learning
相关链接	[Scopus记录]
收录类别	SCI ; EI
语种	英语
学校署名	其他
ESI学科分类	COMPUTER SCIENCE
Scopus记录号	2-s2.0-85189756837
来源库	Scopus
引用统计	被引频次[WOS]：1
成果类型	期刊论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/741082
专题	工学院_电子与电气工程系
作者单位	1.College of Intelligent Systems Science and Engineering,Harbin Engineering University,Harbin,150001,China 2.School of Interdisciplinary Medicine and Engineering,Harbin Medical University,Harbin,150081,China 3.Department of Electronic and Electrical Engineering,Southern University of Science and Technology,Shenzhen,518055,China
推荐引用方式 GB/T 7714	Jia，Yunpeng,Ye，Xiufen,Liu，Yusong,et al. Multi-modal recursive prompt learning with mixup embedding for generalization recognition[J]. Knowledge-Based Systems,2024,294.
APA	Jia，Yunpeng,Ye，Xiufen,Liu，Yusong,&Guo，Shuxiang.(2024).Multi-modal recursive prompt learning with mixup embedding for generalization recognition.Knowledge-Based Systems,294.
MLA	Jia，Yunpeng,et al."Multi-modal recursive prompt learning with mixup embedding for generalization recognition".Knowledge-Based Systems 294(2024).