题名 | MTI-Net: A Multi-Target Speech Intelligibility Prediction Model |
作者 | |
DOI | |
发表日期 | 2022
|
会议名称 | Interspeech Conference
|
ISSN | 2308-457X
|
EISSN | 1990-9772
|
会议录名称 | |
卷号 | 2022-September
|
页码 | 5463-5467
|
会议日期 | SEP 18-22, 2022
|
会议地点 | null,Incheon,SOUTH KOREA
|
出版地 | C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
|
出版者 | |
摘要 | Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores. |
关键词 | |
学校署名 | 其他
|
语种 | 英语
|
相关链接 | [Scopus记录] |
收录类别 | |
WOS研究方向 | Acoustics
; Audiology & Speech-Language Pathology
; Computer Science
; Engineering
|
WOS类目 | Acoustics
; Audiology & Speech-Language Pathology
; Computer Science, Artificial Intelligence
; Engineering, Electrical & Electronic
|
WOS记录号 | WOS:000900724505130
|
Scopus记录号 | 2-s2.0-85140047138
|
来源库 | Scopus
|
引用统计 |
被引频次[WOS]:2
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/406918 |
专题 | 南方科技大学 |
作者单位 | 1.National Taiwan University,Taiwan 2.Academia Sinica, 3.Microsoft Corporation, 4.Southern University of Science and Technology of China,China |
推荐引用方式 GB/T 7714 |
Zezario,Ryandhimas E.,Fu,Szu Wei,Chen,Fei,et al. MTI-Net: A Multi-Target Speech Intelligibility Prediction Model[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:5463-5467.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论