一种稳定、精准、实时的语音信号基频的检测与提取算法
Robust, Precise and Real-Time Algorithm for Speech Signal Pitch Detection and Extraction
摘要: 针对语音基频检测与提取问题,融合了频域算法和时域算法的特点,提出了针对语音基频检测与提取的两步算法,首先基于频域算法的稳定性给出基频的一个粗估计,然后根据时域算法的精确性,再给出一个精确估计。该算法达到了稳定、精准、实时的目标。实验结果表明,该算法在汉语语音基频检测与提取方面的性能优于语音分析与处理专用软件Praat和Adobe Audition的相应功能。
Abstract: A new two-step algorithm was proposed for speech pitch detection and fundamental frequency extraction. This algorithm first estimates a guess of the pitch based on the frequency analysis, and then calculates an accurate solution for the pitch based on time-domain analysis. This algorithm realized the expectation of robust, accurate and real-time. The experimental results show that the performance of this algorithm is better than that of Praat and Adobe Audition in Chinese speech pitch detection and fundamental frequency extraction.
文章引用:章森, 曹瑞兴, 邓海刚. 一种稳定、精准、实时的语音信号基频的检测与提取算法[J]. 图像与信号处理, 2020, 9(4): 246-255. https://doi.org/10.12677/JISP.2020.94028

参考文献

[1] 张金光. 语言发音模型研究综述[J]. 计算机工程与应用, 2018, 54(12): 27-34.
[2] Zhao, H. and Gan, W.J. (2013) A New Pitch Estimation Method Based on AMDF. Journal of Multimedia, 8, 618-625.
[Google Scholar] [CrossRef
[3] Lin, Q.G. and Shao, Y.W. (2018) A Novel Normalization Method for Autocorrelation Function for Pitch Detection and for Speech Activity Detection. Interspeech, Hyderabad, 2-6 September 2018, 2097-2101.
[Google Scholar] [CrossRef
[4] Kim, J.W., Salamon, J., Li, P., et al. (2018) CREPE: A Convolutional Representation for Pitch Estimation. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 15-20 April 2018, 161-165.
[Google Scholar] [CrossRef
[5] 陈萧, 徐波. 改进的用于口语处理的基频提取算法[J]. 清华大学学报(自然科学版), 2017, 57(1): 95-99.
[6] Kasi, K. and Zahorian, S.A. (2002) Yet Another Algorithm for Pitch Tracking. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, Vol. 1, 361-364.
[Google Scholar] [CrossRef
[7] Gonzalez, S. and Brookes, M. (2014) PEFAC-A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 518-530.
[Google Scholar] [CrossRef
[8] Hajimolahoseini, H., Amirfattahi, R., Gazor, S., et al. (2016) Robust Estimation and Tracking of Pitch Period Using an Efficient Bayesian Filter. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24, 1219-1229.
[Google Scholar] [CrossRef
[9] Huang, F. and Lee, T. (2013) Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique. IEEE Transactions on Audio, Speech, and Language Processing, 21, 99-109.
[Google Scholar] [CrossRef
[10] Zhang, X.L., Zhang, H., Nie, S., et al. (2016) A Pairwise Algorithm Using Deep Stacking Network for Speech Separation and Pitch Estimation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24, 1066-1078.
[Google Scholar] [CrossRef
[11] Miller, N. (2003) Pitch Detection by Data Reduction. IEEE Transactions on Acoustics Speech & Signal Processing, 23, 72-79.
[Google Scholar] [CrossRef
[12] Du, S.C., Sugiura, Y. and Shimamura, T. (2018) Combining Zero Replacement Speech Enhancement with Lag Window Method for Pitch Detection. 2018 IEEE 3rd International Conference on Communication and Information Systems (ICCIS), Singapore, 28-30 December 2018, 53-57.
[Google Scholar] [CrossRef
[13] Bahja, F., Martino, J.D., Elhaj, E.I., et al. (2016) A Corroborative Study on Improving Pitch Determination by Time-Frequency Cepstrum Decomposition Using Wavelets. Springerplus, 5, 564.
[Google Scholar] [CrossRef] [PubMed]
[14] Kaur, R. and Kumar, N. (2015) Review on Multi Pitch Detection in Speech. International Journal of Scientific Research in Computer Science and Engineering, 3, 6-10.
[15] Chu, W. and Alwan, A. (2012) SAFE: A Statistical Approach to F0 Estimation under Clean and Noisy Conditions. IEEE Transactions on Audio, Speech, and Language Processing, 20, 933-944.
[Google Scholar] [CrossRef
[16] Nielsen, J.K., Christensen, M.G., Jensen, S.H., et al. (2013) Default Bayesian Estimation of the Fundamental Frequency. IEEE Transactions on Audio, Speech, and Language Processing, 21, 598-610.
[Google Scholar] [CrossRef
[17] Jin, Z.Z. and Wang, D.L. (2011) HMM-Based Multipitch Tracking for Noisy and Reverberant Speech. IEEE Transactions on Audio Speech and Language Processing, 19, 1091-1102.
[Google Scholar] [CrossRef
[18] Zhang, S. and Shirai, K. (2000) Visual Approach for Automatic Pitch Period Estimation. IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Vol. 3, 1339-1342.
[19] Niu, Y.D., Chen, F. and Chen, J. (2019) The Effect of F0 Contour on the Intelligibility of Mandarin Chinese for Hearing-Impaired Listeners. The Journal of the Acoustical Society of America, 146, 85-91.
[Google Scholar] [CrossRef] [PubMed]
[20] 宋黎明, 李明, 颜永红. 谐波显著度的基频提取方法[J]. 声学学报, 2015, 40(2): 294-299.