|
[1]
|
Berouti, M., Schwartz, R. and Makhoul, J. (1979) Enhancement of Speech Corrupted by Acoustic Noise. Proceedings of the 1979 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Washington, 2-4 April 1979, 208-211.
|
|
[2]
|
Ephraim, Y. (1992) Statistical-Model-Based Speech Enhancement Systems. Proceedings of the IEEE, 80, 1526-1555. [Google Scholar] [CrossRef]
|
|
[3]
|
Lim, J. and Oppenheim, A. (1978) All-Pole Modeling of Degraded Speech. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 197-210.
|
|
[4]
|
Dendrinos, M., Bakamidis, S. and Carayannis, G. (1991) Speech Enhancement from Noise: A Regenerative Approach. Speech Communication, 10, 45-57. [Google Scholar] [CrossRef]
|
|
[5]
|
Ephraim, Y. and Van Trees, H.L. (1995) A Signal Subspace Approach for Speech Enhancement. IEEE Transactions on Speech and Audio Processing, 3, 251-266. [Google Scholar] [CrossRef]
|
|
[6]
|
Pascual, S., Bonafonte, A. and Serrà, J. (2017) SEGAN: Speech Enhancement Generative Adversarial Network. Interspeech 2017, Stockholm, 20-24 August 2017, 3642-3646. [Google Scholar] [CrossRef]
|
|
[7]
|
Cao, R., Abdulatif, S. and Yang, B. (2022) CMGAN: Conformer-Based Metric GAN for Speech Enhancement. Interspeech 2022, Incheon, 18-22 September 2022, 936-940. [Google Scholar] [CrossRef]
|
|
[8]
|
Kim, M., Song, H., Cheong, S. and Shin, J.W. (2022) iDeepMMSE: An Improved Deep Learning Approach to MMSE Speech and Noise Power Spectrum Estimation for Speech Enhancement. Interspeech 2022, Incheon, 18-22 September 2022, 181-185. [Google Scholar] [CrossRef]
|
|
[9]
|
Hwang, S., Park, S. and Park, Y. (2022) Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections. Interspeech 2022, Incheon, 18-22 September 2022, 191-195. [Google Scholar] [CrossRef]
|
|
[10]
|
Fu, Y., Liu, Y., Li, J., Luo, D., Lv, S., Jv, Y., et al. (2022) Uformer: A UNet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation. ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 May 2022, 7417-7421. [Google Scholar] [CrossRef]
|
|
[11]
|
Wang, H. and Tian, B. (2025) ZipEnhancer: Dual-Path Down-Up Sampling-Based Zipformer for Monaural Speech Enhancement. ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, 6-11 April 2025, 1-5. [Google Scholar] [CrossRef]
|
|
[12]
|
Lee, S., Cheong, S., Han, S. and Shin, J.W. (2025) FlowSE: Flow Matching-Based Speech Enhancement. ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, 6-11 April 2025, 1-5. [Google Scholar] [CrossRef]
|
|
[13]
|
Gulati, A., Qin, J., Chiu, C., Parmar, N., Zhang, Y., Yu, J., et al. (2020) Conformer: Convolution-Augmented Transformer for Speech Recognition. Interspeech 2020, Shanghai, 25-29 October 2020, 5036-5040. [Google Scholar] [CrossRef]
|
|
[14]
|
Chen, Z., Yoshioka, T., Lu, L., Zhou, T., Meng, Z., Luo, Y., et al. (2020) Continuous Speech Separation: Dataset and Analysis. ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 7284-7288. [Google Scholar] [CrossRef]
|
|
[15]
|
胡从刚, 申艺翔, 孙永奇, 等. 基于Conformer的端到端语音识别方法[J]. 计算机应用研究, 2024, 41(7): 2018-2024.
|
|
[16]
|
Koizumi, Y., Karita, S., Wisdom, S., Erdogan, H., Hershey, J.R., Jones, L., et al. (2021) DF-Conformer: Integrated Architecture of Conv-Tasnet and Conformer Using Linear Complexity Self-Attention for Speech Enhancement. 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, 17-20 October 2021, 161-165. [Google Scholar] [CrossRef]
|
|
[17]
|
Abdulatif, S., Armanious, K., Guirguis, K., Sajeev, J.T. and Yang, B. (2021) AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks. 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, 18-21 January 2021, 451-455. [Google Scholar] [CrossRef]
|
|
[18]
|
Abdulatif, S., Armanious, K., Sajeev, J.T., Guirguis, K. and Yang, B. (2021) Investigating Cross-Domain Losses for Speech Enhancement. 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, 23-27 August 2021, 411-415. [Google Scholar] [CrossRef]
|
|
[19]
|
Hu, Y., Liu, Y., Lv, S., Xing, M., Zhang, S., Fu, Y., et al. (2020) DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement. Interspeech 2020, Shanghai, 25-29 October 2020, 3885-3889. [Google Scholar] [CrossRef]
|
|
[20]
|
Défossez, A., Synnaeve, G. and Adi, Y. (2020) Real Time Speech Enhancement in the Waveform Domain. Interspeech 2020, Shanghai, 25-29 October 2020, 3291-3295. [Google Scholar] [CrossRef]
|
|
[21]
|
Kim, D., Chung, S., Han, H., Ji, Y. and Kang, H. (2023) HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders. INTERSPEECH 2023, Dublin, 20-24 August 2023, 4125-4129. [Google Scholar] [CrossRef]
|
|
[22]
|
Wang, K., He, B. and Zhu, W. (2021) TSTNN: Two-Stage Transformer Based Neural Network for Speech Enhancement in the Time Domain. ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 6-11 June 2021, 7098-7102. [Google Scholar] [CrossRef]
|
|
[23]
|
Défossez, A., Berrada, L., Dumoulin, V., et al. (2020) Music Source Separation in the Waveform Domain. arXiv: 1911.13254.
|
|
[24]
|
武瑞沁, 陈雪勤, 俞杰, 王丽荣, 赵鹤鸣. 结合注意力机制的改进U-Net网络在端到端语音增强中的应用[J]. 声学学报, 2022, 47(2): 266-275.
|
|
[25]
|
范君怡, 杨吉斌, 张雄伟, 郑昌艳. U-net网络中融合多头注意力机制的单通道语音增强[J]. 声学学报, 2022, 47(6): 703-716.
|
|
[26]
|
Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W. and Liu, Y. (2024) Roformer: Enhanced Transformer with Rotary Position Embedding. Neurocomputing, 568, Article ID: 127063. [Google Scholar] [CrossRef]
|
|
[27]
|
Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A. and Vandergheynst, P. (2017) Geometric Deep Learning: Going Beyond Euclidean Data. IEEE Signal Processing Magazine, 34, 18-42. [Google Scholar] [CrossRef]
|
|
[28]
|
Sadasivan, J., Seelamantula, C.S. and Muraka, N.R. (2020) Speech Enhancement Using a Risk Estimation Approach. Speech Communication, 116, 12-29. [Google Scholar] [CrossRef]
|
|
[29]
|
Cheng, J., Liang, R., Liang, Z., et al. (2023) A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator with Multi-Kernel Maximum Mean Discrepancy. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 41-53.
|
|
[30]
|
Hsieh, T., Wang, H., Lu, X. and Tsao, Y. (2020) WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-To-End Speech Enhancement. IEEE Signal Processing Letters, 27, 2149-2153. [Google Scholar] [CrossRef]
|
|
[31]
|
Yu, Z., Yu, L., Zheng, W. and Wang, S. (2023) EIU-Net: Enhanced Feature Extraction and Improved Skip Connections in U-Net for Skin Lesion Segmentation. Computers in Biology and Medicine, 162, Article ID: 107081. [Google Scholar] [CrossRef] [PubMed]
|
|
[32]
|
Kipf, T.N. and Welling, M. (2017) Semi-Supervised Classification with Graph Convolutional Networks. arXiv: 1609.02907.
|
|
[33]
|
Valentini-Botinhao, C., Wang, X., Takaki, S. and Yamagishi, J. (2016) Investigating RNN-Based Speech Enhancement Methods for Noise-Robust Text-To-Speech. 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), Sunnyvale, 13-15 September 2016, 146-152. [Google Scholar] [CrossRef]
|
|
[34]
|
Veaux, C., Yamagishi, J. and King, S. (2013) The Voice Bank Corpus: Design, Collection and Data Analysis of a Large Regional Accent Speech Database. 2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), Gurgaon, 25-27 November 2013, 1-4. [Google Scholar] [CrossRef]
|
|
[35]
|
Thiemann, J., Ito, N. and Vincent, E. (2013) Demand: A Collection of Multi-Channel Recordings of Acoustic Noise in Diverse Environments. Proceedings of Meetings on Acoustics, Paris, 2-7 June 2013, 1-8.
|
|
[36]
|
Varga, A. and Steeneken, H.J.M. (1993) Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Communication, 12, 247-251. [Google Scholar] [CrossRef]
|
|
[37]
|
Yamamoto, R., Song, E. and Kim, J. (2020) Parallel WaveGAN: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram. ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 6199-6203. [Google Scholar] [CrossRef]
|