题名 | Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation |
作者 | |
DOI | |
发表日期 | 2022
|
会议名称 | 30th European Signal Processing Conference (EUSIPCO)
|
ISSN | 2219-5491
|
ISBN | 978-1-6654-6799-5
|
会议录名称 | |
卷号 | 2022-August
|
页码 | 155-159
|
会议日期 | 29 Aug.-2 Sept. 2022
|
会议地点 | Belgrade, Serbia
|
出版地 | 345 E 47TH ST, NEW YORK, NY 10017 USA
|
出版者 | |
摘要 | Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models. |
关键词 | |
学校署名 | 第一
|
语种 | 英语
|
相关链接 | [Scopus记录] |
收录类别 | |
资助项目 | Guangdong Provincial Key Laboratory of Robotics and Intelligent Systems[ZDSYS20200810171800001];
|
WOS研究方向 | Acoustics
; Computer Science
; Engineering
; Imaging Science & Photographic Technology
; Telecommunications
|
WOS类目 | Acoustics
; Computer Science, Software Engineering
; Engineering, Electrical & Electronic
; Imaging Science & Photographic Technology
; Telecommunications
|
WOS记录号 | WOS:000918827600032
|
Scopus记录号 | 2-s2.0-85141011446
|
来源库 | Scopus
|
全文链接 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9909636 |
引用统计 |
被引频次[WOS]:0
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/411946 |
专题 | 南方科技大学 |
作者单位 | 1.Shenzhen Key Laboratory of Robotics Perception and Intelligence,Southern University of Science and Technology,Shenzhen,China 2.Research Center for Information Technology,Innovation Academia Sinica,Taipei,Taiwan |
第一作者单位 | 南方科技大学 |
第一作者的第一单位 | 南方科技大学 |
推荐引用方式 GB/T 7714 |
Feng,Zicheng,Tsao,Yu,Chen,Fei. Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2022:155-159.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论