南方科技大学知识苑(SUSTech KC): 基于卷积神经网络的低频自适应声源定位

题名	基于卷积神经网络的低频自适应声源定位
其他题名	ADAPTIVE SOUND SOURCE LOCALIZATION AT LOW FREQUENCIES BASED ON CONVOLUTIONAL NEURAL NETWORKS
姓名	马文博
姓名拼音	MA Wenbo
学号	12132410
学位类型	硕士
学位专业	0801 力学
学科门类/专业学位类别	08 工学
导师	刘轶军
导师单位	力学与航空航天工程系
论文答辩日期	2024-05-07
论文提交日期	2024-06-24
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	声源定位技术在故障诊断、语音分离和减振降噪等多个应用领域起着关键作用。尽管波束形成算法在基于麦克风阵列的声源定位中得到广泛应用，但其在低频时的分辨率受到限制。近年来，基于深度学习的声源定位算法显著提高了定位精度。然而，现有的一些算法通常依赖于大型麦克风阵列，并且需要针对不同频率训练不同的神经网络模型，因此应用范围受限。针对此问题，本文通过对神经网络中不同超参数进行对比，提出了一种基于卷积神经网络实现声源定位的方法。该算法通过利用随机生成的不同声源个数、频率和麦克风阵列与声源距离变化的数据作为数据集，将麦克风阵列上的声压分布作为神经网络的输入，并通过设计的相应训练标签和损失函数对模型进行训练。然后，通过随机的声源个数、频率、距离的测试数据，评估了模型的预测精度和在不同噪音类型以及不同信噪比下的鲁棒性，并与经典的波束形成算法、CLEAN-SC、DAMAS、MUSIC等进行了比较。测试结果显示，所提出的神经网络模型显著提高了低频定位精度，表明了其在声源定位中的有效性和潜力。最后，由于声源定位算法通常需要与摄像头画面结合实现声源定位，因此本文基于所提出的声源定位算法，将神经网络模型整合到声源定位软件中，结合摄像头画面，并通过QT框架实现了图形化界面，方便用户使用。
关键词	声源定位深度学习麦克风阵列 QT 声源定位软件
语种	中文
培养类别	独立培养
入学年份	2021
学位授予年份	2024-06
参考文献列表	[1] LEE G S, CHEONG C, SHIN S H, et al. A case study of localization and identification of noise sources from a pitch and a stall regulated wind turbine[J]. Applied Acoustics, 2012, 73(8): 817-827. [2] BALLESTEROS J A, SARRADJ E, FERNANDEZ M D, et al. Noise source identification with beamforming in the pass-by of a car[J]. Applied Acoustics, 2015, 93: 106-119. [3] GRUMIAUX P A, KITIĆ S, GIRIN L, et al. A survey of sound source localization with deeplearning methods[J]. The Journal of the Acoustical Society of America, 2022, 152(1): 107-151. [4] CHAZAN S E, HAMMER H, HAZAN G, et al. Multi-microphone speaker separation based on deep DOA estimation[C]//2019 27th European Signal Processing Conference (EUSIPCO).IEEE, 2019: 1-5. [5] LIU G, YUAN S, WU J, et al. A sound source localization method based on microphone array for mobile robot[C]//2018 Chinese Automation Congress (CAC). IEEE, 2018: 1621-1625. [6] LI X, GIRIN L, BADEIG F, et al. Reverberant sound localization with a robot head based on direct-path relative transfer function[C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016: 2819-2826. [7] ZOHOURIAN M, ENZNER G, MARTIN R. Binaural speaker localization integrated into anadaptive beamformer for hearing aids[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 26(3): 515-528. [8] SONGGONG K, CHEN H. Robust indoor speaker localization in the circular harmonic domain[J]. IEEE Transactions on Industrial Electronics, 2020, 68(4): 3413-3422. [9] NIU H, GONG Z, OZANICH E, et al. Deep-learning source localization using multi-frequency magnitude-only data[J]. The Journal of the Acoustical Society of America, 2019, 146(1): 211-222. [10] CABADA E C, HAMZAOUI N, LECLERE Q, et al. Acoustic imaging applied to fault detection in rotating machine[C]//Surveillance 8. 2015. [11] POPPER A N, FAY R R, POPPER A N. Sound source localization: Vol. 25[M]. Springer, 2005. [12] XU P, ARCONDOULIS E J, LIU Y. Acoustic source imaging using densely connected convolutional networks[J]. Mechanical Systems and Signal Processing, 2021, 151: 107370. [13] LEE S Y, CHANG J, LEE S. Deep learning-enabled high-resolution and fast sound source localization in spherical microphone array system[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12. [14] PAWLACZYK-ŁUSZCZYŃSKA M, SZYMCZAK W, DUDAREWICZ A, et al. Proposed criteria for assessing low frequency noise annoyance in occupational settings.[J]. InternationalJournal of Occupational Medicine & Environmental Health, 2006, 19(3). [15] ALVES J A, SILVA L T, REMOALDO P C. Impacts of low frequency noise exposure onwell-being: A case-study from portugal[J]. Noise & health, 2018, 20(95): 131. [16] TETI L, DE LEÓN G, DEL PIZZO L G, et al. Modelling the acoustic performance of newlylaid low-noise pavements[J]. Construction and Building Materials, 2020, 247: 118509. [17] SILVA L T, MAGALHÃES A, SILVA J F, et al. Impacts of low-frequency noise from industrial sources in residential areas[J]. Applied Acoustics, 2021, 182: 108203. [18] HU H, WANG M, FU M, et al. Sound source localization sensor of robot for tdoa method[C]//2011 Third International Conference on Intelligent Human-Machine Systems and Cybernetics:Vol. 2. IEEE, 2011: 19-22. [19] ROY R, PAULRAJ A, KAILATH T. Estimation of signal parameters via rotational invariance techniques-ESPRIT[C]//MILCOM 1986-IEEE Military Communications Conference:Communications-Computers: Teamed for the 90’s: Vol. 3. IEEE, 1986: 41-6. [20] HAHN W, TRETTER S. Optimum processing for delay-vector estimation in passive signalarrays[J]. IEEE Transactions on Information Theory, 1973, 19(5): 608-614. [21] BECHLER D, KROSCHEL K. Reliability criteria evaluation for TDOA estimates in a variety ofreal environments[C]//Proceedings.(ICASSP’05). IEEE International Conference on Acoustics,Speech, and Signal Processing, 2005.: Vol. 4. IEEE, 2005: iv-985. [22] KNAPP C, CARTER G. The generalized correlation method for estimation of time delay[J].IEEE transactions on acoustics, speech, and signal processing, 1976, 24(4): 320-327. [23] STOICA P, NEHORAI A. MUSIC, maximum likelihood, and Cramer-Rao bound[J]. IEEETransactions on Acoustics, speech, and signal processing, 1989, 37(5): 720-741. [24] SCHMIDT R. Multiple emitter location and signal parameter estimation[J]. IEEE transactionson antennas and propagation, 1986, 34(3): 276-280. [25] ROY R, KAILATH T. ESPRIT-estimation of signal parameters via rotational invariance techniques[J]. IEEE Transactions on acoustics, speech, and signal processing, 1989, 37(7): 984-995. [26] 白宗龙. 基于稀疏贝叶斯学习的声源方位角估计算法研究[D]. 哈尔滨工业大学, 2021. [27] CADZOW J A. A high resolution direction-of-arrival algorithm for narrow-band coherent and incoherent sources[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1988,36(7): 965-979. [28] OTTERSTEN B, VIBERG M, STOICA P, et al. Exact and large sample maximum likelihood techniques for parameter estimation and detection in array processing[M]//Radar array processing. Springer, 1993: 99-151. [29] CARTER G C. Variance bounds for passively locating an acoustic source with a symmetric linearray[J]. The Journal of the Acoustical Society of America, 1977, 62(4): 922-926. [30] VAN VEEN B D, BUCKLEY K M. Beamforming: A versatile approach to spatial filtering[J].IEEE assp magazine, 1988, 5(2): 4-24. [31] 许丹. 基于传声器阵列的旋转声源识别方法研究[D]. 合肥工业大学, 2017. [32] 陈才慧. 基于傅里叶变换的快速迭代收缩阈值反卷积声源识别算法研究[D]. 重庆大学,2018. [33] DIBIASE J H, SILVERMAN H F, BRANDSTEIN M S. Robust localization in reverberantrooms[M]//Microphone arrays: signal processing techniques and applications. Springer, 2001:157-180. [34] 王永良. 空间谱估计理论与算法[M]. 清华大学出版社有限公司, 2004. [35] CHEN X, WANG D, YIN J, et al. A direct position-determination approach for multiple sourcesbased on neural network computation[J]. Sensors, 2018, 18(6): 1925. [36] VERA-DIAZ J M, PIZARRO D, MACIAS-GUARASA J. Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates[J]. Sensors,2018, 18(10): 3418. [37] KUJAWSKI A, HEROLD G, SARRADJ E. A deep learning method for grid-free localizationand quantification of sound sources[J]. The Journal of the Acoustical Society of America, 2019,146(3): EL225-EL231. [38] MA W, LIU X. Phased microphone array for sound source localization with deep learning[J].Aerospace Systems, 2019, 2(2): 71-81. [39] SIJTSMA P. CLEAN based on spatial source coherence[J]. International journal of aeroacoustics, 2007, 6(4): 357-374. [40] BROOKS T F, HUMPHREYS W M. A deconvolution approach for the mapping of acoustic sources (DAMAS) determined from phased microphone arrays[J]. Journal of sound andvibration, 2006, 294(4-5): 856-879. [41] CASTELLINI P, GIULIETTI N, FALCIONELLI N, et al. A neural network based microphonearray approach to grid-less noise source localization[J]. Applied Acoustics, 2021, 177: 107947. [42] PUJOL H, BAVU E, GARCIA A. BeamLearning: An end-to-end deep learning approach forthe angular localization of sound sources using raw multichannel acoustic pressure data[J]. TheJournal of the Acoustical Society of America, 2021, 149(6): 4248-4263. [43] LEE S Y, CHANG J, LEE S. Deep learning-based method for multiple sound source localizationwith high resolution and accuracy[J]. Mechanical Systems and Signal Processing, 2021, 161:107959. [44] 赵书艺. 球面阵波束形成的 CLEAN-SC 反卷积及其高分辨率声源识别算法研究[D]. 重庆大学, 2019. [45] 马国昊. 基于神经网络的声源定位算法研究[D]. 北方工业大学, 2022. [46] CHAKRABARTY S, HABETS E A. Multi-speaker DOA estimation using deep convolutionalnetworks trained with noise signals[J]. IEEE Journal of Selected Topics in Signal Processing,2019, 13(1): 8-21. [47] GRUMIAUX P A, KITIĆ S, SRIVASTAVA P, et al. SALADnet: Self-attentive multisourcelocalization in the Ambisonics domain[C]//2021 IEEE Workshop on Applications of SignalProcessing to Audio and Acoustics (WASPAA). IEEE, 2021: 336-340. [48] HAO Y, KÜÇÜK A, GANGULY A, et al. Spectral flux-based convolutional neural networkarchitecture for speech source localization and its real-time implementation[J]. IEEE Access,2020, 8: 197047-197058.61 [49] HE W, MOTLICEK P, ODOBEZ J M. Neural network adaptation and data augmentation formulti-speaker direction-of-arrival estimation[J]. IEEE/ACM Transactions on Audio, Speech,and Language Processing, 2021, 29: 1303-1317. [50] XU P, ARCONDOULIS E J, LIU Y. Acoustic source imaging using densely connected convolutional networks[J/OL]. Mechanical Systems and Signal Processing, 2021, 151: 107370.https://www.sciencedirect.com/science/article/pii/S0888327020307561. DOI: https://doi.org/10.1016/j.ymssp.2020.107370. [51] ARBERET S, GRIBONVAL R, BIMBOT F. A robust method to count and locate audio sourcesin a multichannel underdetermined mixture[J]. IEEE Transactions on Signal Processing, 2009,58(1): 121-133. [52] LANDSCHOOT C R, XIANG N. Model-based Bayesian direction of arrival analysis for soundsources using a spherical microphone array[J]. The Journal of the Acoustical Society of America, 2019, 146(6): 4936-4946. [53] BOLOGNI G, HEUSDENS R, MARTINEZ J. Acoustic reflectors localization from stereorecordings using neural networks[C]//ICASSP 2021-2021 IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP). IEEE, 2021: 1-5. [54] LIU N, CHEN H, SONGGONG K, et al. Deep learning assisted sound source localization usingtwo orthogonal first-order differential microphone arrays[J]. The Journal of the AcousticalSociety of America, 2021, 149(2): 1069-1084. [55] PEROTIN L, SERIZEL R, VINCENT E, et al. CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector[C]//2018 16th International Workshop on AcousticSignal Enhancement (IWAENC). IEEE, 2018: 241-245. [56] ANGUERA X, BOZONNET S, EVANS N, et al. Speaker diarization: A review of recentresearch[J]. IEEE Transactions on audio, speech, and language processing, 2012, 20(2): 356-370. [57] PARK T J, KANDA N, DIMITRIADIS D, et al. A review of speaker diarization: Recentadvances with deep learning[J]. Computer Speech & Language, 2022, 72: 101317. [58] TRANTER S E, REYNOLDS D A. An overview of automatic speaker diarization systems[J].IEEE Transactions on audio, speech, and language processing, 2006, 14(5): 1557-1565. [59] BOHLENDER A, SPRIET A, TIRRY W, et al. Exploiting temporal context in CNN basedmultisource DOA estimation[J]. IEEE/ACM Transactions on Audio, Speech, and LanguageProcessing, 2021, 29: 1594-1608. [60] FAHIM A, SAMARASINGHE P N, ABHAYAPALA T D. Multi-source DOA estimationthrough pattern recognition of the modal coherence of a reverberant soundfield[J]. IEEE/ACMTransactions on Audio, Speech, and Language Processing, 2019, 28: 605-618. [61] GRUMIAUX P A, KITIĆ S, GIRIN L, et al. High-resolution speaker counting in reverberant rooms using CRNN with ambisonics features[C]//2020 28th European Signal ProcessingConference (EUSIPCO). IEEE, 2021: 71-75. [62] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [63] 周倩. 基于卷积神经网络的参数选择及可视化研究[D]. 哈尔滨工业大学, 2019.
所在学位评定分委会	力学
国内图书分类号	TN911.7
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/765957
专题	南方科技大学工学院_力学与航空航天工程系
推荐引用方式 GB/T 7714	马文博. 基于卷积神经网络的低频自适应声源定位[D]. 深圳. 南方科技大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132410-马文博-力学与航空航天（22800KB）	--	--	限制开放	--	请求全文