中文版 | English
题名

Concept-Guided Prompt Learning for Generalization in Vision-Language Models

作者
通讯作者He, Zhihai
DOI
发表日期
2024-03-25
会议名称
38th AAAI Conference on Artificial Intelligence, AAAI 2024
ISSN
2159-5399
EISSN
2374-3468
ISBN
9781577358879
会议录名称
卷号
38
页码
7377-7386
会议日期
February 20, 2024 - February 27, 2024
会议地点
Vancouver, BC, Canada
会议录编者/会议主办者
Association for the Advancement of Artificial Intelligence
出版地
2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA
出版者
摘要
Contrastive Language-Image Pretraining (CLIP) model has exhibited remarkable efficacy in establishing cross-modal connections between texts and images, yielding impressive performance across a broad spectrum of downstream applications through fine-tuning. However, for generalization tasks, the current fine-tuning methods for CLIP, such as CoOp and CoCoOp, demonstrate relatively low performance on some fine-grained datasets. We recognize the underlying reason is that these previous methods only projected global features into the prompt, neglecting the various visual concepts, such as colors, shapes, and sizes, which are naturally transferable across domains and play a crucial role in generalization tasks. To address this issue, in this work, we propose Concept-Guided Prompt Learning (CPL) for vision-language models. Specifically, we leverage the well-learned knowledge of CLIP to create a visual concept cache to enable concept-guided prompting. In order to refine the text features, we further develop a projector that transforms multi-level visual features into text features. We observe that this concept-guided prompt learning approach is able to achieve enhanced consistency between visual and linguistic modalities. Extensive experimental results demonstrate that our CPL method significantly improves generalization capabilities compared to the current state-of-the-art methods.
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
学校署名
通讯
语种
英语
相关链接[来源记录]
收录类别
WOS研究方向
Computer Science
WOS类目
Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods
WOS记录号
WOS:001239937300098
EI入藏号
20241515870432
EI主题词
Artificial intelligence ; Computational linguistics ; Learning systems
EI分类号
Computer Theory, Includes Formal Logic, Automata Theory, Switching Theory, Programming Theory:721.1 ; Artificial Intelligence:723.4 ; Computer Applications:723.5 ; Information Retrieval and Use:903.3
来源库
EV Compendex
引用统计
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/794405
专题南方科技大学
作者单位
1.Harbin Institute of Technology, China
2.Southern University of Science and Technology, China
3.Carnegie Mellon University, United States
4.Pengcheng Laboratory, China
第一作者单位南方科技大学
通讯作者单位南方科技大学
推荐引用方式
GB/T 7714
Zhang, Yi,Zhang, Ce,Yu, Ke,et al. Concept-Guided Prompt Learning for Generalization in Vision-Language Models[C]//Association for the Advancement of Artificial Intelligence. 2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA:Association for the Advancement of Artificial Intelligence,2024:7377-7386.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Zhang, Yi]的文章
[Zhang, Ce]的文章
[Yu, Ke]的文章
百度学术
百度学术中相似的文章
[Zhang, Yi]的文章
[Zhang, Ce]的文章
[Yu, Ke]的文章
必应学术
必应学术中相似的文章
[Zhang, Yi]的文章
[Zhang, Ce]的文章
[Yu, Ke]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。