中文版 | English
题名

Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation

作者
DOI
发表日期
2022
会议名称
30th European Signal Processing Conference (EUSIPCO)
ISSN
2219-5491
ISBN
978-1-6654-6799-5
会议录名称
卷号
2022-August
页码
155-159
会议日期
29 Aug.-2 Sept. 2022
会议地点
Belgrade, Serbia
出版地
345 E 47TH ST, NEW YORK, NY 10017 USA
出版者
摘要
Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models.
关键词
学校署名
第一
语种
英语
相关链接[Scopus记录]
收录类别
资助项目
Guangdong Provincial Key Laboratory of Robotics and Intelligent Systems[ZDSYS20200810171800001];
WOS研究方向
Acoustics ; Computer Science ; Engineering ; Imaging Science & Photographic Technology ; Telecommunications
WOS类目
Acoustics ; Computer Science, Software Engineering ; Engineering, Electrical & Electronic ; Imaging Science & Photographic Technology ; Telecommunications
WOS记录号
WOS:000918827600032
Scopus记录号
2-s2.0-85141011446
来源库
Scopus
全文链接https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9909636
引用统计
被引频次[WOS]:0
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/411946
专题南方科技大学
作者单位
1.Shenzhen Key Laboratory of Robotics Perception and Intelligence,Southern University of Science and Technology,Shenzhen,China
2.Research Center for Information Technology,Innovation Academia Sinica,Taipei,Taiwan
第一作者单位南方科技大学
第一作者的第一单位南方科技大学
推荐引用方式
GB/T 7714
Feng,Zicheng,Tsao,Yu,Chen,Fei. Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2022:155-159.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Feng,Zicheng]的文章
[Tsao,Yu]的文章
[Chen,Fei]的文章
百度学术
百度学术中相似的文章
[Feng,Zicheng]的文章
[Tsao,Yu]的文章
[Chen,Fei]的文章
必应学术
必应学术中相似的文章
[Feng,Zicheng]的文章
[Tsao,Yu]的文章
[Chen,Fei]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。