基于CBAM-CRN的面向会议场景的多通道回声消除模型
Multi-Channel Echo Cancellation Model for Conference Scenarios Based on CBAM-CRN
DOI: 10.12677/csa.2024.144093, PDF,    国家自然科学基金支持
作者: 孙慧冰, 丁碧云:南昌航空大学信息工程学院,江西 南昌;孙成立*:广州航海学院信息与通信工程学院,广东 广州
关键词: 深度学习多通道回声消除U型网络混合域注意力Deep Learning Multi-Channel Echo Cancellation U-Net Mixed Domain Attention
摘要: 本文研究了基于深度学习的多通道回声消除方法,提出了基于卷积块注意力模块(CBAM)融合卷积循环网络(CRN)的多通道回声消除方法。该方法利用U型网络的特征提取能力和LSTM网络处理时序信号的优势,结合了时频掩蔽算法和稀疏自适应归一化处理,同时融合了通道注意力和空间注意力联合机制,该混合域注意力能够有效地捕获关键特征并抑制无关特征。实验表明,CBAM-CRN方法在多种通话模式下均优于自适应滤波和其他深度学习方法,有效提高了远场免提通话的语音质量。
Abstract: In this paper, we study the multi-channel echo cancellation method based on deep learning for acoustic echo problem, and propose a multi-channel echo cancellation method based on convolutional block attention module (CBAM) and convolutional recurrent network (CRN). This method takes advantage of the feature extraction ability of U-Net and the advantages of LSTM network in processing time series signals, combines the time-frequency masking algorithm and sparse adaptive normalization processing, and fuses the channel attention and spatial attention joint mechanism, the hybrid domain attention can effectively capture key features and suppress irrelevant features. Experimental results show that the CBAM-CRN method is superior to adaptive filtering and other deep learning methods in various call modes, and effectively improves the voice quality of far field hands-free calls.
文章引用:孙慧冰, 丁碧云, 孙成立. 基于CBAM-CRN的面向会议场景的多通道回声消除模型[J]. 计算机科学与应用, 2024, 14(4): 230-241. https://doi.org/10.12677/csa.2024.144093

参考文献

[1] Sondhi, M.M. and Morgan, D.R. (1995) Stereophonic Acoustic Echo Cancellation—An Overview of the Fundamental Problem. IEEE Signal Processing Letters, 2, 148-151. [Google Scholar] [CrossRef
[2] Widrow, B. and Hoff, M.E. (1960) Adaptive Switching Circuits. Neurocomputing, 4, 126-134.
[3] Soo, J.S. and Pang, K.K. (1990) Multidelay Block Frequency Domain Adaptive Filter. IEEE Transactions on Acoustics Speech & Signal Processing, 38, 373-376. [Google Scholar] [CrossRef
[4] Gilloire, A., Petillon, T. and Theodoridis, S. (1992) Acoustic Echo Cancellation Using Fast RLS Adaptive Filters with Reduced Complexity. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 6-11 June 2021, 7138-7142.
[5] Westhausen, N.L. and Meyer, B.T. (2021) Acoustic Echo Cancellation with the Dual-Signal Transformation LSTM Network. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 6-11 June 2021, 7138-7142. [Google Scholar] [CrossRef
[6] Zhang, H. and Wang, D. (2018) Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios. Training, 161, 322. [Google Scholar] [CrossRef
[7] Zhang, Y., et al. (2020) Generative Adversarial Network Based Acoustic Echo Cancellation. Interspeech, 3945-3949. [Google Scholar] [CrossRef
[8] Kim, J.-H. and Chang, J.-H. (2020) Attention Wave-U-Net for Acoustic Echo Cancellation. Interspeech, 3969-3973. [Google Scholar] [CrossRef
[9] Zhang, H. and Wang, D.L. (2021) A Deep Learning Approach to Multi-Channel and Multi-Microphone Acoustic Echo Cancellation. Interspeech, 1139-1143. [Google Scholar] [CrossRef
[10] 程琳娟, 彭任华, 郑成诗, 等. 两阶段复数谱卷积循环网络立体声回声消除[J]. 声学学报, 2023, 48(1): 199-214.
[11] Ruiz, S., van Waterschoot, T. and Moonen, M. (2022) Cascade Multi-Channel Noise Reduction and Acoustic Feedback Cancellation. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 May 2022, 676-680. [Google Scholar] [CrossRef
[12] Panayotov, V., et al. (2015) Librispeech: An ASR Corpus Based on Public Domain Audio Books. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, 19-24 April 2015, 5206-5210. [Google Scholar] [CrossRef
[13] Wichern, G., et al. (2019) WHAM!: Extending Speech Separation to Noisy Environments. Interspeech, 1368-1372. [Google Scholar] [CrossRef
[14] Breining, C., et al. (1999) Acoustic Echo Control. An Application of Very-High-Order Adaptive Filters. IEEE Signal Processing Magazine, 16, 42-69. [Google Scholar] [CrossRef
[15] Rix, A.W., et al. (2001) Perceptual Evaluation of Speech Quality (PESQ)—A New Method for Speech Quality Assessment of Telephone Networks and Codecs. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, 7-11 May 2001, 749-752.
[16] Taal, C.H., et al. (2010) A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech. 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, 14-19 March 2010, 4214-4217. [Google Scholar] [CrossRef
[17] Le Roux, J., et al. (2019) SDR–Half-Baked or Well Done? 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, 12-17 May 2019, 626-630. [Google Scholar] [CrossRef
[18] Lea, C., et al. (2016) Temporal Convolutional Networks: A Unified Approach to Action Segmentation. In: Hua, G. and Jégou, H., Eds., Computer VisionECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9915, Springer, Cham, 47-54.