基于RF-CNN-CBAM-BiLSTM的兴山民歌分类研究
A Study on the Classification of Xingshan Folk Songs Using RF-CNN-CBAM-BiLSTM
DOI: 10.12677/aam.2025.148379, PDF,   
作者: 白雨欣, 刘依林, 肖维维*:北方工业大学理学院,北京;雷萌非:北京市海淀外国语藤飞学校,北京
关键词: 音频分类注意力机制深度学习Audio Classification Attention Mechanism Deep Learning
摘要: 本研究构建了RF-CNN-CBAM-BiLSTM算法,并对兴山民歌进行分类识别。该模型先利用随机森林(Random Forest, RF)降维方法替代原始的经验选择来挑选训练所用特征,再将卷积块注意力模块(Convolutional Block Attention Module, CBAM)模块融入卷积神经网络(Convolutional Neural Network, CNN)架构,增强模型特征关注与识别能力,接着利用双向长短期记忆网络(Bidirectional Long Short-Term Memory, BiLSTM)捕捉音频序列双向上下文信息,提升兴山民歌的识别性能。本研究所提模型的识别准确率达到91.67%,每一轮的运行时间为22 s,其准确率超过残差网络(Residual Network 50, ResNet50),高效网络(Rethinking Model Scaling for Convolutional Neural Networks, EfficientNetV1)等四种基准模型的平均准确率,约5.24%。每一轮运行速度较四种基准模型的平均运行时间缩短68.25 s。
Abstract: This study proposes an audio classification algorithm based on the RF-CNN-CBAM-BiLSTM architecture for the classification and recognition of Xingshan folk songs. The model first employs Random Forest (RF) for feature dimensionality reduction, replacing traditional empirical feature selection. Then, the Convolutional Block Attention Module (CBAM) is integrated into the Convolutional Neural Network (CNN) architecture to enhance the model’s ability to focus on and extract relevant features. Subsequently, a Bidirectional Long Short-Term Memory (BiLSTM) network is utilized to capture bidirectional contextual information in audio sequences, further improving the recognition performance of Xingshan folk songs. The proposed model achieves a recognition accuracy of 91.67%, with an average runtime of 22 seconds per training epoch. Its accuracy surpasses the average performance of four benchmark models, including the Residual Network (ResNet50) and EfficientNetV1, by approximately 5.24%. Additionally, the average runtime per epoch is reduced by 68.25 seconds compared to these benchmarks.
文章引用:白雨欣, 刘依林, 雷萌非, 肖维维. 基于RF-CNN-CBAM-BiLSTM的兴山民歌分类研究[J]. 应用数学进展, 2025, 14(8): 147-159. https://doi.org/10.12677/aam.2025.148379

参考文献

[1] Elbir, A. and Aydin, N. (2020) Music Genre Classification and Music Recommendation by Using Deep Learning. Electronics Letters, 56, 627-629. [Google Scholar] [CrossRef
[2] Fu, Z.Y., Lu, G.J., Ting, K.M. and Zhang, D.S. (2011) A Survey of Audio-Based Music Classification and Annotation. IEEE Transactions on Multimedia, 13, 303-319. [Google Scholar] [CrossRef
[3] Zaman, K., Sah, M., Direkoglu, C. and Unoki, M. (2023) A Survey of Audio Classification Using Deep Learning. IEEE Access, 11, 106620-106649. [Google Scholar] [CrossRef
[4] Zahid, S., Hussain, F., Rashid, M., Yousaf, M.H. and Habib, H.A. (2015) Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods. Mathematical Problems in Engineering, 2015, Article ID: 209814. [Google Scholar] [CrossRef
[5] Breebaart, J. and Mckinney, M.F. (2004) Features for Audio Classification. In: Verhaegh, W.F.J., Aarts, E. and Korst, J., Eds., Algorithms in Ambient Intelligence, Springer, 113-129. [Google Scholar] [CrossRef
[6] Cances, L., Labbé, E. and Pellegrini, T. (2022) Comparison of Semi-Supervised Deep Learning Algorithms for Audio Classification. EURASIP Journal on Audio, Speech, and Music Processing, 2022, Article No. 23. [Google Scholar] [CrossRef
[7] Nanni, L., Costa, Y.M.G., Aguiar, R.L., Mangolin, R.B., Brahnam, S. and Silla, C.N. (2020) Ensemble of Convolutional Neural Networks to Improve Animal Audio Classification. EURASIP Journal on Audio, Speech, and Music Processing, 2020, Article No. 8. [Google Scholar] [CrossRef
[8] Nam, J., Choi, K., Lee, J., Chou, S. and Yang, Y. (2019) Deep Learning for Audio-Based Music Classification and Tagging: Teaching Computers to Distinguish Rock from Bach. IEEE Signal Processing Magazine, 36, 41-51. [Google Scholar] [CrossRef
[9] Matityaho, B. and Furst, M. (1994) Classification of Music Type by a Multilayer Neural Network. The Journal of the Acoustical Society of America, 95, 2959-2959. [Google Scholar] [CrossRef
[10] Nanni, L., Costa, Y.M.G., Lucio, D.R., Silla, C.N. and Brahnam, S. (2017) Combining Visual and Acoustic Features for Audio Classification Tasks. Pattern Recognition Letters, 88, 49-56. [Google Scholar] [CrossRef
[11] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer VisionECCV 2018, Springer, 3-19. [Google Scholar] [CrossRef