南方科技大学知识苑(SUSTech KC): Accelerating Vision-Language Pretraining with Free Language Modeling

题名	Accelerating Vision-Language Pretraining with Free Language Modeling
作者	Teng Wang1 ; Yixiao Ge 2; Feng Zheng1 ; Ran Cheng1 ; Ying Shan 2; Xiaohu Qie 3; Ping Luo 4
DOI	10.1109/CVPR52729.2023.02218
发表日期	2023
ISSN	1063-6919
ISBN	979-8-3503-0130-4
会议录名称	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
卷号	2023-June
页码	23161-23170
会议日期	17-24 June 2023
会议地点	Vancouver, BC, Canada
摘要	The state of the arts in vision-language pretraining (VLP) achieves exemplary performance but suffers from high training costs resulting from slow convergence and long training time, especially on large-scale web datasets. An essential obstacle to training efficiency lies in the entangled prediction rate (percentage of tokens for reconstruction) and corruption rate (percentage of corrupted tokens) in masked language modeling (MLM), that is, a proper corruption rate is achieved at the cost of a large portion of output tokens being excluded from prediction loss. To accelerate the convergence of VLP, we propose a new pretraining task, namely, free language modeling (FLM), that enables a 100% prediction rate with arbitrary corruption rates. FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted. FLM-trained models are encouraged to learn better and faster given the same GPU time by exploiting bidirectional contexts more flexibly. Extensive experiments show FLM could achieve an impressive 2.5 × pretraining time reduction in comparison to the MLM-based methods, while keeping competitive performance on both vision-language understanding and generation tasks. Code will be public at https://github.com/TencentARC/FLM.
关键词	Vision language and reasoning
学校署名	第一
相关链接	[IEEE记录]
收录类别	CPCI-S ; EI
WOS记录号	WOS:001062531307047
EI入藏号	20234114867548
来源库	IEEE
全文链接	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10204651
引用统计	被引频次[WOS]：1
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/559186
专题	南方科技大学
作者单位	1.Southern University of Science and Technology 2.ARC Lab 3.Tencent PCG 4.The University of Hong Kong
第一作者单位	南方科技大学
第一作者的第一单位	南方科技大学
推荐引用方式 GB/T 7714	Teng Wang,Yixiao Ge,Feng Zheng,et al. Accelerating Vision-Language Pretraining with Free Language Modeling[C],2023:23161-23170.