中文版 | English
题名

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

作者
通讯作者Wei,Zhihua
DOI
发表日期
2022
会议名称
Interspeech Conference
ISSN
2308-457X
EISSN
1990-9772
会议录名称
卷号
2022-September
页码
1686-1690
会议日期
SEP 18-22, 2022
会议地点
null,Incheon,SOUTH KOREA
出版地
C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
出版者
摘要
Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pruning structured parameters. More precisely, we create a Transformer-based supernet that is nested with thousands of weight-sharing subnets and design a two-stage distillation strategy to leverage the contextualized latent representations from HuBERT. Experiments on automatic speech recognition (ASR) and the SUPERB benchmark show the proposed LightHuBERT enables over 10 architectures concerning the embedding dimension, attention dimension, head number, feed-forward network ratio, and network depth. LightHuBERT outperforms the original HuBERT on ASR and five SUPERB tasks with the HuBERT size, achieves comparable performance to the teacher model in most tasks with a reduction of 29% parameters, and obtains a 3.5× compression ratio in three SUPERB tasks, e.g., automatic speaker verification, keyword spotting, and intent classification, with a slight accuracy loss. The code and pre-trained models are available at https://github.com/mechanicalsea/lighthubert.
关键词
学校署名
其他
语种
英语
相关链接[Scopus记录]
收录类别
资助项目
National Nature Science Foundation of China["61976160","61906137","61976158","62076184","62076182"]
WOS研究方向
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science ; Engineering
WOS类目
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号
WOS:000900724501174
Scopus记录号
2-s2.0-85140048392
来源库
Scopus
引用统计
被引频次[WOS]:10
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/406917
专题工学院_计算机科学与工程系
作者单位
1.Department of Computer Science and Technology,Tongji University,China
2.Department of Computer Science and Engineering,Southern University of Science and Technology,China
3.School of Data Science,The Chinese University of Hong Kong,Shenzhen,China
4.Microsoft,
5.Peng Cheng Laboratory,China
6.ByteDance AI Lab,
推荐引用方式
GB/T 7714
Wang,Rui,Bai,Qibing,Ao,Junyi,et al. LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:1686-1690.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Wang,Rui]的文章
[Bai,Qibing]的文章
[Ao,Junyi]的文章
百度学术
百度学术中相似的文章
[Wang,Rui]的文章
[Bai,Qibing]的文章
[Ao,Junyi]的文章
必应学术
必应学术中相似的文章
[Wang,Rui]的文章
[Bai,Qibing]的文章
[Ao,Junyi]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。