CRNN-Transformer:基于混合神经网络的音乐风格分类方法
CRN-Transformer: A Music Style Classification Method Based on Hybrid Neural Networks
DOI: 10.12677/sea.2025.145088, PDF,   
作者: 肖凯文, 文 惠:四川大学锦江学院计算机学院,四川 眉山;马汶溪:西南科技大学应用技术学院,四川 绵阳
关键词: 深度学习音乐风格分类Transformer时序建模Deep Learning Music Genre Classification Transformer Timing Modeling
摘要: 在数字化时代背景下,音乐信息检索技术的发展日新月异,音乐风格分类作为该领域的核心任务之一,对于提升音乐推荐系统的性能和用户体验具有重要意义。为了从复杂的音频信号中准确识别和分类音乐风格,本研究设计并开发了一种基于混合神经网络的CRNN-Transformer模型。本文的技术创新集中在基于CNN算法引入的残差神经网络模块(RESNET)、双向门控循环神经单元(GRU)和Transformer模块的关键改进上。首先,采用ResNet模块来增强模型在频谱空间特征提取的能力,通过残差连接解决深层网络中的梯度消失问题;其次,引入双向GRU模块以更好地捕捉时序信息,通过同时考虑过去和未来的信息,进一步提升模型对序列数据的理解;最后,集成Transformer模块,利用自注意力机制建模长距离依赖关系,从而增强模型的表示能力。本研究使用音频的梅尔频率倒谱系数(MFCC)作为输入特征,进行特征提取和时序建模。实验结果表明,相比于传统的CNN网络,CRNN-Transformer分别在F1-score,Precision,Recall三个指标上提升了14.8%,16%,13.7%,而在与其他主流模型进行的比较中,各指标也均取得了最佳表现。
Abstract: In the context of the digital era, the development of music information retrieval technology is changing with each passing day. As one of the core tasks in this field, music style classification is of great significance to improve the performance and user experience of music recommendation system. In order to accurately identify and classify music styles from complex audio signals, this study designed and developed a CRNN-Transformer model based on hybrid neural networks. The technical innovation of this paper focuses on the key improvements of residual neural network module (RESNET), bidirectional gated recurrent neural unit (GRU) and Transformer module based on CNN algorithm. Firstly, the ResNet module is used to enhance the ability of the model to extract features in the spectrum space, and the gradient disappearance problem in the deep network is solved by residual connection. Secondly, the bidirectional GRU module is introduced to better capture the timing information, and the model’s understanding of sequence data is further improved by considering both past and future information. Finally, the Transformer module is integrated, and the self-attention mechanism is used to model long-distance dependencies, thereby enhancing the representation ability of the model. In this study, Mel-Frequency Cepstral Coefficients (MFCC) of audio are used as input features for feature extraction and time series modeling. The experimental results show that compared with the traditional CNN network, CRNN-Transformer improves F1-score, Precision and Recall by 14.8%, 16% and 13.7%, respectively. In comparison with other mainstream models, each index also achieves the best performance.
文章引用:肖凯文, 文惠, 马汶溪. CRNN-Transformer:基于混合神经网络的音乐风格分类方法[J]. 软件工程与应用, 2025, 14(5): 985-997. https://doi.org/10.12677/sea.2025.145088

参考文献

[1] 刘伟. 基于深度学习的音乐流派分类模型研究[D]: [硕士学位论文]. 沈阳: 沈阳工业大学, 2024.
[2] 唐和铭. 基于深度学习的音乐流派分类算法研究[D]: [硕士学位论文]. 北京: 北京印刷学院, 2024.
[3] 郭媛媛. 基于深度学习的音乐推荐系统研究与实现[D]: [硕士学位论文]. 南京: 东南大学, 2023.
[4] Chaudhury, M., Karami, A. and Ghazanfar, M.A. (2022) Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark. Electronics, 11, Article No. 2567. [Google Scholar] [CrossRef
[5] 宋光晓. 基于深度神经网络的音乐特征提取及应用研究[D]: [硕士学位论文]. 上海: 东华大学, 2023.
[6] Dabas, C., Agarwal, A., Gupta, N., Jain, V. and Pathak, S. (2020) Machine Learning Evaluation for Music Genre Classification of Audio Signals. International Journal of Grid and High Performance Computing, 12, 57-67. [Google Scholar] [CrossRef
[7] Vigneshwar, J. and R, T. (2024) Performance Analysis of Deep Learning and Machine Learning Methods for Music Genre Classification System. Journal of Soft Computing Paradigm, 6, 116-127. [Google Scholar] [CrossRef
[8] Zaman, K., Sah, M., Direkoglu, C. and Unoki, M. (2023) A Survey of Audio Classification Using Deep Learning. IEEE Access, 11, 106620-106649. [Google Scholar] [CrossRef
[9] Srivastava, N., Ruhil, S. and Kaushal, G. (2022) Music Genre Classification Using Convolutional Recurrent Neural Networks. 2022 IEEE 6th Conference on Information and Communication Technology (CICT), Gwalior, 18-20 November 2022, 1-5. [Google Scholar] [CrossRef
[10] Venkatesh, J., Kannan, K., Ayyadurai, M. and Sathish, M.G. (2023) Impact of Machine Learning in Music Genre Classification Using CNN. 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, 6-8 July 2023, 1-6. [Google Scholar] [CrossRef
[11] Xu, W. (2024) Music Genre Classification Using Deep Learning: A Comparative Analysis of CNNs and RNNs. Applied Mathematics and Nonlinear Sciences, 9, 1-16. [Google Scholar] [CrossRef
[12] Guo, Y. (2024) Research on Music Genre Recognition Method Based on Deep Learning. Molecular & Cellular Biomechanics, 21, Article No. 373. [Google Scholar] [CrossRef
[13] Yang, R., Feng, L., Wang, H., Yao, J. and Luo, S. (2020) Parallel Recurrent Convolutional Neural Networks-Based Music Genre Classification Method for Mobile Devices. IEEE Access, 8, 19629-19637. [Google Scholar] [CrossRef
[14] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[15] Huang, C., Vaswani, A., Uszkoreit, J., et al. (2018) Music Transformer.
[16] Zeng, M., Tan, X., Wang, R., Ju, Z., Qin, T. and Liu, T. (2021) MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, August 2021, 791-800. [Google Scholar] [CrossRef
[17] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[18] Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014) Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, October 2014, 1724-1734. [Google Scholar] [CrossRef