中文版 | English
题名

Investigation of neural network approaches for unified spectral and prosodic feature enhancement

作者
DOI
发表日期
2019-11-01
ISSN
2309-9402
ISBN
978-1-7281-3249-5
会议录名称
页码
1179-1184
会议日期
18-21 Nov. 2019
会议地点
Lanzhou, China
出版地
345 E 47TH ST, NEW YORK, NY 10017 USA
出版者
摘要
Most speech enhancement (SE) systems focus on the spectral feature or raw-waveform enhancement. However, many speech-related applications rely on other features rather than the spectral features, such as the intensity and fundamental frequency (f0). Therefore, a unified feature enhancement for different types of features is worth investigating. In this work, we train our neural network (NN)-based SE system in a manner that simultaneously minimizes the spectral loss and preserves the correctness of the intensity and f0 contours extracted from the enhanced speech. The idea is to introduce an NN-based feature extractor to the SE framework that imitates the feature extraction of Praat. Then, we can train the SE system by minimizing the combined loss of the spectral feature, intensity, and f0. We investigate three bidirectional long short-term memory (BLSTM)-based unified feature enhancement systems: fixed-concat, joint-concat, and multi-task. The results of the experiments on the Taiwan Mandarin hearing in a noise test dataset (TMHINT) demonstrate that all three systems show improved intensity and f0 extraction accuracy without sacrificing the perceptual evaluation of the speech quality and short-time objective intelligibility scores compared with the baseline SE system. Further analysis of the experimental results shows that the improvement mostly comes from better f0 contours under difficult conditions such as low signal-to-noise ratio and nonstationary noises. Our work demonstrates the advantage of the unified feature enhancement and provides new insights for SE.
关键词
学校署名
其他
语种
英语
相关链接[Scopus记录]
收录类别
资助项目
MOST Taiwan Grants[108-2634-F-008-004][108-2634-F-001-004]
WOS研究方向
Computer Science ; Engineering ; Imaging Science & Photographic Technology
WOS类目
Computer Science, Software Engineering ; Engineering, Electrical & Electronic ; Imaging Science & Photographic Technology
WOS记录号
WOS:000555696900199
EI入藏号
20201308362137
EI主题词
Extraction ; Quality control ; Speech intelligibility ; Signal to noise ratio ; Audition ; Statistical tests
EI分类号
Ergonomics and Human Factors Engineering:461.4 ; Information Theory and Signal Processing:716.1 ; Speech:751.5 ; Chemical Operations:802.3 ; Quality Assurance and Control:913.3 ; Mathematical Statistics:922.2 ; Acoustical Instruments:941.1
Scopus记录号
2-s2.0-85082398079
来源库
Scopus
全文链接https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9023309
引用统计
被引频次[WOS]:1
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/106478
专题工学院_电子与电气工程系
作者单位
1.Research Center for Information Technology Innovation,Academic Sinica,Taiwan
2.Southern University of Science and Technology,Department of Electrical and Electronic Engineering,China
3.Institute of Information Science,Academic Sinica,Taiwan
推荐引用方式
GB/T 7714
Lin,Wei Cheng,Tsao,Yu,Chen,Fei,et al. Investigation of neural network approaches for unified spectral and prosodic feature enhancement[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2019:1179-1184.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Lin,Wei Cheng]的文章
[Tsao,Yu]的文章
[Chen,Fei]的文章
百度学术
百度学术中相似的文章
[Lin,Wei Cheng]的文章
[Tsao,Yu]的文章
[Chen,Fei]的文章
必应学术
必应学术中相似的文章
[Lin,Wei Cheng]的文章
[Tsao,Yu]的文章
[Chen,Fei]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。