融合LSTM和注意力机制的音乐分类推荐方法
Music Classification and Recommendation Method Combining LSTM and AM
DOI: 10.12677/CSA.2020.1012240, PDF,    国家自然科学基金支持
作者: 冯鹏宇, 陈平华, 申建芳:广东工业大学,计算机学院,广东 广州
关键词: 音乐推荐音乐分类长短期记忆网络注意力机制卷积神经网络Music Recommendation Music Classification LSTM AM CNN
摘要: 针对音乐资源过于庞大,现有的音乐推荐方法分类准确度不高,对用户情感的识别较模糊导致人们在生活中难以寻找到偏好音乐的问题,本文提出一种将长短期记忆神经网络(Long Short-Term Memory, LSTM)与注意力机制(Attention Model, AM)相融合的音乐分类及推荐方法,该方法由音乐分类模型和音乐推荐模型两部分组成。首先对音频数据的声学特征进行捕获,构成含有多维特征的序列后,通过LSTM神经网络和注意力机制对音乐进行情感分类,接下来采集用户的历史收听记录,选取最近的十首歌曲并生成频谱图,结合CNN (Convolutional Neural Networks, CNN)对用户当前情感进行识别,提升推荐的高效性。实验部分将新提出的模型与其他传统音乐分类模型进行多组对比测试,结果显示与近年来现存的模型相比,新提出的模型明显提升了情感判断及用户情感识别的准确度,音乐推荐的准确度有所增强。
Abstract: In view of the huge amount of music resources, the existing music recommendation methods have low classification accuracy, fuzzy recognition of user emotions, and low concentration of target data analysis, which makes it difficult to satisfy people’s preference for music in daily life. Due to demand and other issues, a music classification and recommendation method combining Long Short-Term Memory and Attention Model is proposed. The method consists of a music classification model and a music classification model. The recommended model consists of two parts. First to capture audio data of various acoustic characteristics, constitute a sequence containing multidimensional characteristics, through the LSTM Neural network classification of music emotion and attention mechanism; the next, gathering user history to record, select its most recent ten songs and generate the spectrum diagram, combined with CNN (Convolutional Neural Networks, CNN) to accurately identify the user’s current emotion, recommend the efficiency of ascension. The experimental part com-pares the new model with other traditional music classification models, and the results show that compared with the existing models in recent years, the new model significantly improves the accuracy of emotion judgment and user emotion recognition, and the accuracy of music recommendation is enhanced to some extent.
文章引用:冯鹏宇, 陈平华, 申建芳. 融合LSTM和注意力机制的音乐分类推荐方法[J]. 计算机科学与应用, 2020, 10(12): 2280-2290. https://doi.org/10.12677/CSA.2020.1012240

参考文献

[1] 刘杨. 个性化音乐推荐系统的研究与实现[M]. 北京: 北京邮电大学, 2014.
[2] 陈雅茜. 音乐推荐系统及相关技术研究[J]. 计算机工程与应用, 2012, 48(18): 9-16.
[3] Ness, S.R., Theocharis, A., Tzanetakis, G., et al. (2009) Im-proving Automatic Music Tag Annotation Using Stacked Generalization of Probabilistic SVM Outputs. International Conference on Multimedia, Vancouver, October 2009, 705-708. [Google Scholar] [CrossRef
[4] Huang, Y.S., Chou, S.Y. and Yang, Y.H. (2018) Pop Music Highlighter: Marking the Emotion Keypoints. Audio and Speech Processing.
[5] Mirsamadi, S., Barsoum, E. and Zhang, C. (2017) Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 19 June 2017, 2227-2231. [Google Scholar] [CrossRef
[6] Piczak, K.J. (2015) Environmental Sound Classi-fication with Convolutional Neural Networks. IEEE 25th International Workshop on Machine Learning for Signal Pro-cessing (MLSP), Boston, MA, 1-6. [Google Scholar] [CrossRef
[7] Zhang, Z., Xu, S., Cao, S., et al. (2018) Deep Convolutional Neural Network with Mixup for Environmental Sound Classification. Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Springer, Cham, 356-367. [Google Scholar] [CrossRef
[8] Hinto, G., Deng, L., Yu, D., et al. (2012) Deep Neural Net-works for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Pro-cessing Magazine, 29, 82-97. [Google Scholar] [CrossRef
[9] Van Den Oord, A., Dieleman, S., Zen, H., et al. (2016) Wavenet: A Generative Model for Raw Audio. SSW, 125. arXiv:1609.03499.
[10] Palo, H.K., Mohanty, M. and Chandra, M. (2015) Computational Vision and Robotics. Advances in Intelligent Systems and Computing, 332, 63-70.
[11] Roddy, C. (2001) Emotion Recognition in Human-Computer Interaction. Signal Processing Magazine, 18, 32-80. [Google Scholar] [CrossRef
[12] 张燕, 唐振民, 李燕萍. 面向推荐系统的音乐特征抽取[J]. 计算机工程与应用, 2011, 47(5): 130-133.
[13] Zhang, L., Wu, D., Han, X., et al. (2016) Feature Extraction of Under-water Target Signal Using Mel Frequency Cepstrum Coefficients Based on Acoustic Vector Sensor. Journal of Sensors, 4, 1-11. [Google Scholar] [CrossRef
[14] Gers, F.A., Schmidhube, J. and Cummins, F. (1999) Learning to Forget: Continual Prediction with LSTM. 9th International Conference on Artificial Neural Networks: ICANN’99, 850-855. [Google Scholar] [CrossRef