题名 | Multi-modal recursive prompt learning with mixup embedding for generalization recognition |
作者 | |
通讯作者 | Ye,Xiufen |
发表日期 | 2024-06-21
|
DOI | |
发表期刊 | |
ISSN | 0950-7051
|
卷号 | 294 |
摘要 | The contrastive language-image pretraining (CLIP) model has shown promise in generalization recognition by combining visual and textual embeddings. However, the pretrained CLIP model requires further fine tuning for downstream tasks. Existing prompt learning (PL) approaches, while effective, neglect the cross-hierarchical fusion of multimodal features. To address this issue, we introduce multi-modal recursive PL (MmRPL) for different generalization image recognition tasks. Recursive connections between prompts at different layers facilitate hierarchy-aware vision-text fusion. To our best knowledge, we introduce a mixup embedding technique to enhance feature representations of PL for the first time. Extensive experiments across popular generalization recognition settings reveal that MmRPL outperforms existing methods. Furthermore, when combined with the embedding technique, MmRPL further enhances recognition in base-to-novel generalization, generalized zero-shot, domain generalization, and domain adaptive zero-shot scenarios. Notably, the proposed methods extend to applications like zero-shot recognition of side-scan sonar (SSS) image targets via domain generalization adaptation. The proposed methods also achieve more than 80% mean accuracy among seafloor, airplane, and ship categories in SSS images. |
关键词 | |
相关链接 | [Scopus记录] |
收录类别 | |
语种 | 英语
|
学校署名 | 其他
|
ESI学科分类 | COMPUTER SCIENCE
|
Scopus记录号 | 2-s2.0-85189756837
|
来源库 | Scopus
|
引用统计 |
被引频次[WOS]:1
|
成果类型 | 期刊论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/741082 |
专题 | 工学院_电子与电气工程系 |
作者单位 | 1.College of Intelligent Systems Science and Engineering,Harbin Engineering University,Harbin,150001,China 2.School of Interdisciplinary Medicine and Engineering,Harbin Medical University,Harbin,150081,China 3.Department of Electronic and Electrical Engineering,Southern University of Science and Technology,Shenzhen,518055,China |
推荐引用方式 GB/T 7714 |
Jia,Yunpeng,Ye,Xiufen,Liu,Yusong,et al. Multi-modal recursive prompt learning with mixup embedding for generalization recognition[J]. Knowledge-Based Systems,2024,294.
|
APA |
Jia,Yunpeng,Ye,Xiufen,Liu,Yusong,&Guo,Shuxiang.(2024).Multi-modal recursive prompt learning with mixup embedding for generalization recognition.Knowledge-Based Systems,294.
|
MLA |
Jia,Yunpeng,et al."Multi-modal recursive prompt learning with mixup embedding for generalization recognition".Knowledge-Based Systems 294(2024).
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论