中文版 | English
题名

End-to-End Dense Video Captioning with Parallel Decoding

作者
通讯作者Feng Zheng
DOIhttps://openaccess.thecvf.com/content/ICCV2021/papers/Wang_End-to-End_Dense_Video_Captioning_With_Parallel_Decoding_ICCV_2021_paper.pdf
发表日期
2021
会议名称
ICCV
ISSN
1550-5499
ISBN
978-1-6654-2813-2
会议录名称
页码
6827-6837
会议日期
2021
会议地点
Virtual-only Conference
摘要
Dense video captioning aims to generate multiple associated captions with their temporal locations from the video. Previous methods follow a sophisticated "localizethen-describe" scheme, which heavily relies on numerous hand-crafted components. In this paper, we proposed a simple yet effective framework for end-to-end dense video captioning with parallel decoding (PDVC), by formulating the dense caption generation as a set prediction task. In practice, through stacking a newly proposed event counter on the top of a transformer decoder, the PDVC precisely segments the video into a number of event pieces under the holistic understanding of the video content, which effectively increases the coherence and readability of predicted captions. Compared with prior arts, the PDVC has several appealing advantages: (1) Without relying on heuristic non-maximum suppression or a recurrent event sequence selection network to remove redundancy, PDVC directly produces an event set with an appropriate size; (2) In contrast to adopting the two-stage scheme, we feed the enhanced representations of event queries into the localization head and caption head in parallel, making these two sub-tasks deeply interrelated and mutually promoted through the optimization; (3) Without bells and whistles, extensive experiments on ActivityNet Captions and YouCook2 show that PDVC is capable of producing high-quality captioning results, surpassing the state-of-the-art two-stage methods when its localization accuracy is on par with them. Code is available at https://github.com/ttengwang/PDVC.
关键词
学校署名
通讯
相关链接[IEEE记录]
收录类别
EI入藏号
20221511951890
来源库
人工提交
全文链接https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9710686
引用统计
被引频次[WOS]:0
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/257589
专题南方科技大学
工学院_计算机科学与工程系
作者单位
1.The University of Hong Kong
2.Southern University of Science and Technology
3.The Chinese University of Hong Kong (Shenzhen)
4.Shenzhen Research Institute of Big Data
第一作者单位南方科技大学
通讯作者单位南方科技大学
推荐引用方式
GB/T 7714
Teng Wang,Ruimao Zhang,Zhichao Lu,et al. End-to-End Dense Video Captioning with Parallel Decoding[C],2021:6827-6837.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
Wang_End-to-End_Dens(1690KB)----限制开放--
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Teng Wang]的文章
[Ruimao Zhang]的文章
[Zhichao Lu]的文章
百度学术
百度学术中相似的文章
[Teng Wang]的文章
[Ruimao Zhang]的文章
[Zhichao Lu]的文章
必应学术
必应学术中相似的文章
[Teng Wang]的文章
[Ruimao Zhang]的文章
[Zhichao Lu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。