中文版 | English
题名

Fast parameter adaptation for few-shot image captioning and visual question answering

作者
通讯作者Yang, Yi
DOI
发表日期
2018
会议录名称
页码
54-62
会议地点
Seoul, Korea, Republic of
出版地
1515 BROADWAY, NEW YORK, NY 10036-9998 USA
出版者
摘要
Given only a few image-text pairs, humans can learn to detect semantic concepts and describe the content. For machine learning algorithms, they usually require a lot of data to train a deep neural network to solve the problem. However, it is challenging for the existing systems to generalize well to the few-shot multi-modal scenario, because the learner should understand not only images and texts but also their relationships from only a few examples. In this paper, we tackle two multi-modal problems, i.e., image captioning and visual question answering (VQA), in the few-shot setting. We propose Fast Parameter Adaptation for Image-Text Modeling (FPAIT) that learns to learn jointly understanding image and text data by a few examples. In practice, FPAIT has two benefits. (1) Fast learning ability. FPAIT learns proper initial parameters for the joint image-text learner from a large number of different tasks. When a new task comes, FPAIT can use a small number of gradient steps to achieve a good performance. (2) Robust to few examples. In few-shot tasks, the small training data will introduce large biases in Convolutional Neural Networks (CNN) and damage the learner's performance. FPAIT leverages dynamic linear transformations to alleviate the side effects of the small training set. In this way, FPAIT flexibly normalizes the features and thus reduces the biases during training. Quantitatively, FPAIT achieves superior performance on both few-shot image captioning and VQA benchmarks.
© 2018 Association for Computing Machinery.
关键词
学校署名
第一 ; 通讯
语种
英语
相关链接[来源记录]
收录类别
资助项目
Data to Decisions Cooperative Research Centres[]
WOS研究方向
Computer Science ; Engineering
WOS类目
Computer Science, Theory & Methods ; Engineering, Electrical & Electronic
WOS记录号
WOS:000509665700007
EI入藏号
20185006246260
EI主题词
Benchmarking ; Deep neural networks ; Linear transformations ; Mathematical transformations ; Natural language processing systems ; Neural networks ; Semantics
EI分类号
Data Processing and Image Processing:723.2 ; Mathematical Transformations:921.3
来源库
EV Compendex
引用统计
被引频次[WOS]:35
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/50959
专题南方科技大学
作者单位
1.SUSTech-UTS Joint Centre of CIS, Southern University of Science and Technology, United States
2.CAI, University of Technology Sydney, Australia
3.Information Science Academy, CETC, China
4.CCST, Zhejiang University, China
第一作者单位南方科技大学
通讯作者单位南方科技大学
第一作者的第一单位南方科技大学
推荐引用方式
GB/T 7714
Dong, Xuanyi,Zhu, Linchao,Zhang, De,et al. Fast parameter adaptation for few-shot image captioning and visual question answering[C]. 1515 BROADWAY, NEW YORK, NY 10036-9998 USA:Association for Computing Machinery, Inc,2018:54-62.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Dong, Xuanyi]的文章
[Zhu, Linchao]的文章
[Zhang, De]的文章
百度学术
百度学术中相似的文章
[Dong, Xuanyi]的文章
[Zhu, Linchao]的文章
[Zhang, De]的文章
必应学术
必应学术中相似的文章
[Dong, Xuanyi]的文章
[Zhu, Linchao]的文章
[Zhang, De]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。