题名 | Investigation of neural network approaches for unified spectral and prosodic feature enhancement |
作者 | |
DOI | |
发表日期 | 2019-11-01
|
ISSN | 2309-9402
|
ISBN | 978-1-7281-3249-5
|
会议录名称 | |
页码 | 1179-1184
|
会议日期 | 18-21 Nov. 2019
|
会议地点 | Lanzhou, China
|
出版地 | 345 E 47TH ST, NEW YORK, NY 10017 USA
|
出版者 | |
摘要 | Most speech enhancement (SE) systems focus on the spectral feature or raw-waveform enhancement. However, many speech-related applications rely on other features rather than the spectral features, such as the intensity and fundamental frequency (f0). Therefore, a unified feature enhancement for different types of features is worth investigating. In this work, we train our neural network (NN)-based SE system in a manner that simultaneously minimizes the spectral loss and preserves the correctness of the intensity and f0 contours extracted from the enhanced speech. The idea is to introduce an NN-based feature extractor to the SE framework that imitates the feature extraction of Praat. Then, we can train the SE system by minimizing the combined loss of the spectral feature, intensity, and f0. We investigate three bidirectional long short-term memory (BLSTM)-based unified feature enhancement systems: fixed-concat, joint-concat, and multi-task. The results of the experiments on the Taiwan Mandarin hearing in a noise test dataset (TMHINT) demonstrate that all three systems show improved intensity and f0 extraction accuracy without sacrificing the perceptual evaluation of the speech quality and short-time objective intelligibility scores compared with the baseline SE system. Further analysis of the experimental results shows that the improvement mostly comes from better f0 contours under difficult conditions such as low signal-to-noise ratio and nonstationary noises. Our work demonstrates the advantage of the unified feature enhancement and provides new insights for SE. |
关键词 | |
学校署名 | 其他
|
语种 | 英语
|
相关链接 | [Scopus记录] |
收录类别 | |
资助项目 | MOST Taiwan Grants[108-2634-F-008-004][108-2634-F-001-004]
|
WOS研究方向 | Computer Science
; Engineering
; Imaging Science & Photographic Technology
|
WOS类目 | Computer Science, Software Engineering
; Engineering, Electrical & Electronic
; Imaging Science & Photographic Technology
|
WOS记录号 | WOS:000555696900199
|
EI入藏号 | 20201308362137
|
EI主题词 | Extraction
; Quality control
; Speech intelligibility
; Signal to noise ratio
; Audition
; Statistical tests
|
EI分类号 | Ergonomics and Human Factors Engineering:461.4
; Information Theory and Signal Processing:716.1
; Speech:751.5
; Chemical Operations:802.3
; Quality Assurance and Control:913.3
; Mathematical Statistics:922.2
; Acoustical Instruments:941.1
|
Scopus记录号 | 2-s2.0-85082398079
|
来源库 | Scopus
|
全文链接 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9023309 |
引用统计 |
被引频次[WOS]:1
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/106478 |
专题 | 工学院_电子与电气工程系 |
作者单位 | 1.Research Center for Information Technology Innovation,Academic Sinica,Taiwan 2.Southern University of Science and Technology,Department of Electrical and Electronic Engineering,China 3.Institute of Information Science,Academic Sinica,Taiwan |
推荐引用方式 GB/T 7714 |
Lin,Wei Cheng,Tsao,Yu,Chen,Fei,et al. Investigation of neural network approaches for unified spectral and prosodic feature enhancement[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2019:1179-1184.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论