钢琴音乐的组合频谱特征表示研究
Research on the Representation of Combined Spectrum Characteristics of Piano Music
DOI: 10.12677/CES.2022.109361, PDF,    科研立项经费支持
作者: 胡丽敏:武汉音乐学院,湖北 武汉;桂 浩, 汤健雄:武汉大学计算机学院,湖北 武汉
关键词: 人工智能自动音乐转录钢琴教育组合频谱特征Artificial Intelligence Automatic Music Transcription Piano Education Combined Spectrum Characteristics
摘要: 钢琴教育作为素质教育的代表性种类,日益普及。人工智能在语音识别领域有了全新的发展,钢琴教育也在将从中受益。使广大的钢琴学习者在人工智能的帮助下进行有指导的钢琴练习,是很有研究意义的问题。利用人工智能解决钢琴教育的智能化陪练问题,实际是将学习者演奏钢琴的音频信号转化成数字信号和真实的数字信号进行对比的过程,包括音级识别、自动音乐转录AMT。本文提出了一种组合频率和周期的多重特征表示方法作为音乐数据的特征表示,采用多特征的表示方法的识别效果往往优于单一频谱特征的表示。
Abstract: As a kind of quality education, piano education is becoming more and more popular. Artificial in-telligence has made progresses in the field of speech recognition, and piano education will also benefit from it. It enables the majority of piano learners to carry out guided piano practice with the help of artificial intelligence. How to use artificial intelligence to solve the problem of intelli-gent accompaniment in piano education is actually a process of converting the audio signal of the learner playing the piano into a digital signal and comparing it with the real digital signal, including sound level recognition and automatic music transcription (AMT). In this paper, a mul-ti-feature representation method of combined frequency and period is proposed as the feature representation of music data, and the recognition effect of multi-feature representation method is often better than that of single-spectrum feature representation.
文章引用:胡丽敏, 桂浩, 汤健雄. 钢琴音乐的组合频谱特征表示研究[J]. 创新教育研究, 2022, 10(9): 2292-2299. https://doi.org/10.12677/CES.2022.109361

参考文献

[1] Sigtia, S., Benetos, E. and Dixon, S. (2015) An End-to-End Neural Network for Polyphonic Piano Music Transcription. IEEE/ACM Transactions on Audio Speech & Language Processing, 24, 927-939. [Google Scholar] [CrossRef
[2] Kelz, R., Dorfer, M., Korzeniowski, F., et al. (2016) On the Potential of Simple Framewise Approaches to Piano Transcription. Proceedings of the 17th International Society for Music Information Retrieval Conference, New York City, 475-481.
[3] Su, L. (2017) Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, 12-15 December 2017, 884-891. [Google Scholar] [CrossRef
[4] Su, L. (2018) Vocal Melody Extraction Using Patch-Based CNN. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 15-20 April 2018, 371-375. [Google Scholar] [CrossRef
[5] Hawthorne, C., Stasyuk, A., Roberts, A., et al. (2018) Enabling Factorized Piano Music Modeling and Generation with the Maestro Dataset.
[6] Jansson, A., Humphrey, E., Montecchio, N., et al. (2017) Singing Voice Separation with Deep U-Net Convolutional Networks. Proceedings of the 18th ISMIR Conference, Suzhou, 23-27 October 2017, 745-751.
[7] Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40, 834-848. [Google Scholar] [CrossRef
[8] He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017) Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2980-2988. [Google Scholar] [CrossRef