基于CNN和LightGBM的环境声音分类
Environmental Sound Classification Base on CNN and LightGBM
摘要: 针对传统卷积神经网络在环境声音分类中泛化能力不足且准确率不高的问题,提出了一个新的将CNN和LightGBM融合的环境声音分类模型。新模型在对音频文件进行梅尔频率倒谱系数矩阵预处理基础上,首先应用深度CNN提取音频的高层次特征;然后,结合LightGBM在分类预测上高效准确的特点,将提取的高层次特征导入LightGBM进行训练预测,从而达到提升分类准确性的目的。UrbanSound8K公开数据集上的对比实验结果表明:与目前使用的单独使用卷积神经网络相比,新模型提高了近7.7%的分类准确率。
Abstract: Aiming at the problem that the traditional convolutional neural network has insufficient generalization ability and low accuracy in environmental sound classification, a new model mixing deep CNN with LightGBM is proposed. Based on the preprocessing of the Mel Frequency cepstral coefficient matrix on the audio file, the new model firstly uses the deep convolutional neural network to extract features. Then, combined with the efficient and accurate characters of LightGBM in classification prediction, the extracted features are imported into LightGBM for training. Thereby it achieves the purpose of improving classification accuracy. The results of the comparative experiments on the UrbanSound8K public dataset show that the new model improves the accuracy of 7.7% compared to the using a single-use convolutional neural network model.
文章引用:廖威平, 陈平华, 赵璁, 赵亮, 陈建兵, 董梦琴. 基于CNN和LightGBM的环境声音分类[J]. 计算机科学与应用, 2019, 9(10): 1892-1905. https://doi.org/10.12677/CSA.2019.910212

参考文献

[1] Radhakrishnan, R., Divakaran, A. and Smaragdis, P. (2005) Audio Analysis for Surveillance Applications. 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 16-19 October 2005, 1-4.
[2] Vacher, M., Serignat, J.F. and Chaillol, S. (2014) Sound Classification in a Smart Room Environment: An Approach Using GMM and HMM Methods. The 4th IEEE Conference on Speech Technology and Human-Computer Dialogue, Iasi, Romania, May 2007, 135-146.
[3] Barchiesi, D., Giannoulis, D., Stowell, D. and Plumbley, M.D. (2015) Acoustic Scene Classification: Classifying Environments from the Sounds They Produce. IEEE Signal Processing Magazine, 32, 16-34. [Google Scholar] [CrossRef
[4] Lyon, R.F. (2010) Machine Hearing: An Emerging Field [Ex-ploratory DSP]. Signal Processing Magazine IEEE, 27, 131-139. [Google Scholar] [CrossRef
[5] Kim, K. and Ko, H. (2011) Hierarchical Approach for Abnormal Acoustic Event Classification in an Elevator. 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance, Klagenfurt, Austria, 30 August-2 September 2011, 89-94. [Google Scholar] [CrossRef
[6] Yamakawa, N., Takahashi, T., Kitahara, T., Ogata, T. and Okuno, H.G. (2011) Environmental Sound Recognition for Robot Audition Using Matching-Pursuit. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K. and Ali, M., Eds., Modern Approaches in Applied Intelligence. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 1-10. [Google Scholar] [CrossRef
[7] Eronen, A.J., Peltonen, V.T., Tuomi, J.T., et al. (2006) Au-dio-Based Context Recognition. IEEE Transactions on Audio Speech & Language Processing, 14, 321-329. [Google Scholar] [CrossRef
[8] Lee, K. and Ellis, D.P.W. (2010) Audio-Based Semantic Concept Classification for Consumer Video. IEEE Transactions on Audio, Speech and Language Processing, 18, 1406-1416. [Google Scholar] [CrossRef
[9] Mcloughlin, I.V. (2008) Line Spectral Pairs. Signal Processing, 88, 448-467. [Google Scholar] [CrossRef
[10] Chu, S., Narayanan, S. and Kuo, C.C.J. (2009) Environmental Sound Recognition with Time-Frequency Audio Features. IEEE Transactions on Audio, Speech and Language Pro-cessing, 17, 1142-1158. [Google Scholar] [CrossRef
[11] Valero, X. and Alias, F. (2012) Gammatone Cepstral Coeffi-cients: Biologically Inspired Features for Non-Speech Audio Classification. IEEE Transactions on Multimedia, 14, 1684-1689. [Google Scholar] [CrossRef
[12] Geiger, J.T. and Helwani, K. (2015) Improving Event Detection for Audio Surveillance Using Gabor Filterbank Features. 2015 23rd European Signal Processing Con-ference, Nice, France, 31 August-4 September 2015, 714-718. [Google Scholar] [CrossRef
[13] 王熙, 李应. 多频带谱减法用于生态环境声音分类[J]. 计算机工程与应用, 2014, 50(3): 190-193.
[14] Temko, A., Monte, E. and Nadeu, C. (2005) Comparison of Sequence Discriminant Support Vector Machines for Acoustic Event Classification. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, 14-19 May 2006, 5.
[15] Gupta, S., Dileep, A.D. and Thenkanidiyoor, V. (2016) Segment-Level Pyramid Match Kernels for the Classification of Varying Length Patterns of Speech Using SVMs. 2016 24th European Signal Processing Conference, Budapest, Hungary, 29 August-2 September 2016, 2030-2034. [Google Scholar] [CrossRef
[16] Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M. and Plumbley, M.D. (2015) Detection and Classification of Acoustic Scenes and Events. IEEE Transac-tions on Multimedia, 17, 1733-1746. [Google Scholar] [CrossRef
[17] Piczak, K.J. (2015) ESC: Dataset for Environmental Sound Classification. IEEE Transactions on Wireless Communications, 9, 1015-1018.
[18] Hinton, G., Deng, L., Yu, D., et al. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29, 82-97. [Google Scholar] [CrossRef
[19] Dave, K. and Varma, V. (2014) Music Information Retrieval: Recent Developments and Applications. Now Publishers Inc., Hanover, MA. [Google Scholar] [CrossRef
[20] Mcloughlin, I., Zhang, H., Xie, Z., Song, Y. and Xiao, W. (2015) Robust Sound Event Classification Using Deep Neural Networks. IEEE/ACM Transactions on Audio Speech & Language Pro-cessing, 23, 540-552. [Google Scholar] [CrossRef
[21] Piczak, K.J. (2015) Environmental Sound Classification with Convolutional Neural Networks. 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing, Boston, MA, 17-20 September 2015, 1-6. [Google Scholar] [CrossRef
[22] Medhat, F., Chesmore, D. and Robinson, J. (2018) Masked Conditional Neural Networks for Environmental Sound Classification. 2017 IEEE International Conference on Data Science and Advanced Analytics, Tokyo, 19-21 October 2017, 389-394. [Google Scholar] [CrossRef
[23] Takahashi, N., Gygli, M., Pfister, B. and Van Gool, L. (2016) Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection. Proceedings of Interspeech 2016, 2982-2986. [Google Scholar] [CrossRef
[24] Zhang, H., Mcloughlin, I. and Song, Y. (2015) Robust Sound Event Recognition Using Convolutional Neural Networks. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, 19-24 April 2015, 559-563. [Google Scholar] [CrossRef
[25] Zhang, X., Zou, Y. and Shi, W. (2017) Dilated Convolution Neural Network with LeakyReLU for Environmental Sound Classification. 2017 22nd International Conference on Digi-tal Signal Processing, London, 23-25 August 2017, 1-5. [Google Scholar] [CrossRef
[26] Zhang, Z., Xu, S., Cao, S. and Zhang, S. (2018) Deep Convolutional Neural Network with Mixup for Environmental Sound Classification. In: Lai, J.H., et al., Eds., Pattern Recognition and Computer Vision. Lecture Notes in Computer Science, Springer, Cham, 356-367. [Google Scholar] [CrossRef
[27] Ke, G., Meng, Q., Finley, T., et al. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 31st Conference on Neural Information Pro-cessing Systems (NIPS 2017), Long Beach, CA, 4-9 December 2017, 3146-3154.
[28] Salamon, J., Jacoby, C. and Bello, J.P. (2014) A Dataset and Taxonomy for Urban Sound Research. Proceedings of the 22nd ACM international conference on Multimedia, Orlando, FL, 3-7 November 2014, 1041-1044. [Google Scholar] [CrossRef