中文版 | English
题名

Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

作者
DOI
发表日期
2023
会议名称
IEEE/CVF International Conference on Computer Vision (ICCV)
ISSN
1550-5499
ISBN
979-8-3503-0719-1
会议录名称
页码
2684-2693
会议日期
1-6 Oct. 2023
会议地点
Paris, France
出版地
10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
出版者
摘要
Referring video object segmentation (RVOS), as a supervised learning task, relies on sufficient annotated data for a given scene. However, in more realistic scenarios, only minimal annotations are available for a new scene, which poses significant challenges to existing RVOS methods. With this in mind, we propose a simple yet effective model with a newly designed cross-modal affinity (CMA) module based on a Transformer architecture. The CMA module builds multimodal affinity with a few samples, thus quickly learning new semantic information, and enabling the model to adapt to different scenarios. Since the proposed method targets limited samples for new scenes, we generalize the problem as - few-shot referring video object segmentation (FS-RVOS). To foster research in this direction, we build up a new FS-RVOS benchmark based on currently available datasets. The benchmark covers a wide range and includes multiple situations, which can maximally simulate real-world scenarios. Extensive experiments show that our model adapts well to different scenarios with only a few samples, reaching state-of-the-art performance on the benchmark. On Mini-Ref-YouTube-VOS, our model achieves an average performance of 53.1 and 54.8, which are 10% better than the baselines. Furthermore, we show impressive results of 77.7 and 74.8 on Mini-Ref-SAIL-VOS, which are significantly better than the baselines. Code is publicly available at https://github.com/hengliusky/Few_shot_RVOS.
关键词
学校署名
其他
语种
英语
相关链接[IEEE记录]
收录类别
资助项目
National Key R&D Program of China[2022YFF1202903] ; National Natural Science Foundation of China["61971004","62122035"] ; Natural Science Foundation of Anhui Province, China[2008085MF190] ; Equipment Advanced Research Sharing Technology Project, China[80912020104]
WOS研究方向
Computer Science ; Imaging Science & Photographic Technology
WOS类目
Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods ; Imaging Science & Photographic Technology
WOS记录号
WOS:001159644302087
来源库
IEEE
全文链接https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10377326
引用统计
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/719105
专题南方科技大学
作者单位
1.Anhui University of Technology
2.Southern University of Science and Technology
3.United Imaging
推荐引用方式
GB/T 7714
Guanghui Li,Mingqi Gao,Heng Liu,et al. Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples[C]. 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA:IEEE COMPUTER SOC,2023:2684-2693.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Guanghui Li]的文章
[Mingqi Gao]的文章
[Heng Liu]的文章
百度学术
百度学术中相似的文章
[Guanghui Li]的文章
[Mingqi Gao]的文章
[Heng Liu]的文章
必应学术
必应学术中相似的文章
[Guanghui Li]的文章
[Mingqi Gao]的文章
[Heng Liu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。