人工智能语义分割技术在钢琴教育中的应用研究

doi:10.12677/AIRR.2022.114037

期刊菜单

人工智能语义分割技术在钢琴教育中的应用研究
Application of Artificial Intelligence Semantic Segmentation Technology in Piano Education

DOI: 10.12677/AIRR.2022.114037, PDF, 国家社会科学基金支持
作者: 胡丽敏：武汉音乐学院，湖北武汉；桂浩, 陈开一：武汉大学计算机学院，湖北武汉
关键词: 人工智能；自动音乐转录；钢琴教育；语义分割；Artificial Intelligence； Automatic Music Transcription； Piano Education； Semantic Segmentation

摘要: 目前，素质教育越来越被重视，作为素质教育代表的音乐教育也越来越被关注，但是音乐教育却极大受限于人工教育资源。人工智能在音乐教育中的辅助，从计算机的角度讲就是信号类型转换的过程。例如对于学者弹琴，需要将钢琴的信号转换特定的数字信号与真实的谱子进行对比纠错，从而识别错音、错节奏的现象并实时校正。这一规范技术过程被称为自动音乐转录AMT (Automatic Music Transcription)。本文采用谐波常数Q变换、CFP等不同的音乐数字特征表示方法，将原始的音乐信号转换为频谱图，作为网络结构的特征输入，改进了语义分割模型DeepLabv3+，融合了U-Net的U型结构对多乐器音乐进行转录，该算法在钢琴音乐MPAS数据集上达到了良好的识别效果。

Abstract: At present, quality education is more and more valued, and music education as a representative of quality education is also more and more concerned. But music education is greatly limited by artificial educational resources. The help of artificial intelligence in music education is the process of signal type conversion from the perspective of computer. For example, for scholars to play piano, it is necessary to convert the piano signal to a specific digital signal and compare it with the real spectrum to correct errors, so as to identify the phenomenon of wrong sound and wrong rhythm and correct it in real time. This standardized technical process is called Automatic Music Transcription (AMT). The algorithm comprehensively makes use of digital feature representation methods such as harmonic constant Q transformation and CFP. It converts the original music signal into a spectrum chart as a feature input of the network structure. It improves semantic segmentation model DeepLabv3+ and incorporates U-Net’s U-shaped structure to transcribe multi-instrument music. The algorithm achieves good performance on piano music MPAS datasets.

文章引用：胡丽敏, 桂浩, 陈开一. 人工智能语义分割技术在钢琴教育中的应用研究[J]. 人工智能与机器人研究, 2022, 11(4): 348-355. https://doi.org/10.12677/AIRR.2022.114037

参考文献

[1]	Klapuri, A. (2006) Introduction to Music Transcription. In: Klapuri, A. and Davy, M., Eds., Signal Processing Methods for Music Transcription, Springer, Boston, MA, 3-20. [Google Scholar] [CrossRef]
[2]	Wu, Y., Chen, B., Su, L., et al. (2018) Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning. International Conference on Acoustics, Speech, and Signal Processing, Calgary, 15-20 April 2018, 401-405. [Google Scholar] [CrossRef]
[3]	Sigtia, S., Benetos, E. and Dixon, S. (2020) An End-to-End Neural Network for Polyphonic Piano Music Transcription. IEEE/ACM Transactions on Audio Speech & Language Processing, 24, 927-939. [Google Scholar] [CrossRef]
[4]	Peters, G. (2006) Music Pitch Representation by Periodicity Measures Based on Combined Temporal and Spectral Representations. International Conference on Acoustics, Speech, and Signal Processing, Toulouse, 14-19 May 2006, 53-56.
[5]	Su, L. and Yang, Y. (2015) Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music. IEEE Transactions on Audio, Speech, and Language Processing, 23, 1600-1612. [Google Scholar] [CrossRef]
[6]	Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40, 834-848. [Google Scholar] [CrossRef]
[7]	Lu, W.T. and Su, L. (2018) Vocal Melody Extraction with Semantic Segmentation and Audio-symbolic Domain Transfer Learning. Proceedings of the 19th ISMIR Conference, Paris, France, September 23-27, 2018, 521-528.
[8]	Thickstun, J., Harchaoui, Z., Foster, D., et al. (2018) Invariances and Data Augmentation for Supervised Music Transcription. International Conference on Acoustics, Speech, and Signal Processing, Calgary, 15-20 April 2018 2241-2245. [Google Scholar] [CrossRef]
[9]	Chen, L.-C., Papandreou, G., Schroff, F., et al. (2021) Rethinking Atrous Convolution for Semantic Image Segmentation.
[10]	Hawthorne, C., Stasyuk, A., Roberts, A., et al. (2018) Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset.

为你推荐

友情链接