中文版 | English
题名

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

作者
DOI
发表日期
2022
会议名称
Interspeech Conference
ISSN
2308-457X
EISSN
1990-9772
会议录名称
卷号
2022-September
页码
5463-5467
会议日期
SEP 18-22, 2022
会议地点
null,Incheon,SOUTH KOREA
出版地
C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
出版者
摘要
Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.
关键词
学校署名
其他
语种
英语
相关链接[Scopus记录]
收录类别
WOS研究方向
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science ; Engineering
WOS类目
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号
WOS:000900724505130
Scopus记录号
2-s2.0-85140047138
来源库
Scopus
引用统计
被引频次[WOS]:2
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/406918
专题南方科技大学
作者单位
1.National Taiwan University,Taiwan
2.Academia Sinica,
3.Microsoft Corporation,
4.Southern University of Science and Technology of China,China
推荐引用方式
GB/T 7714
Zezario,Ryandhimas E.,Fu,Szu Wei,Chen,Fei,et al. MTI-Net: A Multi-Target Speech Intelligibility Prediction Model[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:5463-5467.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Zezario,Ryandhimas E.]的文章
[Fu,Szu Wei]的文章
[Chen,Fei]的文章
百度学术
百度学术中相似的文章
[Zezario,Ryandhimas E.]的文章
[Fu,Szu Wei]的文章
[Chen,Fei]的文章
必应学术
必应学术中相似的文章
[Zezario,Ryandhimas E.]的文章
[Fu,Szu Wei]的文章
[Chen,Fei]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。