预训练语言模型在日文文本难易度自动分类中的应用
Application of Pre-Trained Language Models in Automatic Difficulty Classification of Japanese Texts
DOI: 10.12677/csa.2025.1512324, PDF,    科研立项经费支持
作者: 刘 君:广西大学外国语学院,广西 南宁
关键词: 预训练语言模型BERT难易度分类日语教育Pre-Trained Language Model BERT Difficulty Classification Japanese Language Education
摘要: 在日语教学过程中,选择难易度合适的日文文本作为教学材料有利于提高日语学习者的学习兴趣及效率。日语具有词汇量大、语法复杂等特征,对文本难易度分类提出了挑战。本文尝试采用多种基于神经网络的日语预训练语言模型,通过收集历年日本语能力测试真题以及模拟题作为数据集以训练日文文本难易度自动分类模型。实验结果表明,预训练语言模型在日文文本难易度自动分类任务上能够表现出较好的性能。基于预训练语言模型的日文文本难易度自动分类方法将为计算机辅助日语学习系统以及电子化教材开发等提供有力的技术保障。
Abstract: In the process of Japanese teaching and learning, it is important to select Japanese texts with appropriate difficulty levels as educational materials. This can help to enhance Japanese language learners' interest and efficiency. Japanese language is characterized by its large vocabulary and complex grammar, which makes it challenging to automatically classify the difficulty levels of Japanese texts. This paper attempts to apply several neural network-based Japanese pre-trained language models in training automatic difficulty classification models of Japanese texts, by collecting historical Japanese Language Proficiency Test questions and simulated questions as datasets. Experimental results show that pre-trained language models have good performance in automatically classifying the difficulty levels of Japanese texts. The pre-trained language models-based approach for automatic difficulty classification of Japanese texts will offer significant technical support for the development of computer-assisted Japanese learning systems and electronic textbooks.
文章引用:刘君. 预训练语言模型在日文文本难易度自动分类中的应用[J]. 计算机科学与应用, 2025, 15(12): 91-99. https://doi.org/10.12677/csa.2025.1512324

参考文献

[1] Japan Foundation (2023) Survey on Japanese-Language Education Abroad 2021.
https://www.jpf.go.jp/e/project/japanese/survey/result/survey21.html
[2] 王蕾. 文本可读性公式研究发展阶段及特点[J]. 语言教学与研究, 2022(2): 29-40.
[3] Chen, X. and Meurers, D. (2017) Word Frequency and Readability: Predicting the Text‐Level Readability with a Lexical‐level Attribute. Journal of Research in Reading, 41, 486-510. [Google Scholar] [CrossRef
[4] Arase, Y., Uchida, S. and Kajiwara, T. (2022) CEFR-Based Sentence Difficulty Annotation and Assessment. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, December 2022, 6206-6219. [Google Scholar] [CrossRef
[5] Crossley, S., Heintz, A., Choi, J.S., Batchelor, J., Karimi, M. and Malatinszky, A. (2022) A Large-Scaled Corpus for Assessing Text Readability. Behavior Research Methods, 55, 491-507. [Google Scholar] [CrossRef] [PubMed]
[6] Sung, Y.T., Chen, J.L., Cha, J.H., Tseng, H.C., Chang, T.H. and Chang, K.E. (2014) Constructing and Validating Readability Models: The Method of Integrating Multilevel Linguistic Features with Machine Learning. Behavior Research Methods, 47, 340-354. [Google Scholar] [CrossRef] [PubMed]
[7] 吴思远, 于东, 江新. 汉语文本可读性特征体系构建和效度验证[J]. 世界汉语教学, 2020, 34(1): 81-97.
[8] 谭可人, 兰韵诗, 张杨, 等. 基于多层级语言特征融合的中文文本可读性分级模型[J]. 中文信息学报, 2024, 38(5): 41-52.
[9] 柴﨑秀子. リーダビリティー研究と「やさしい」日本語[J]. 日本語教育, 2014(158): 49-65.
[10] Hasebe, Y., and Lee, J.H. (2015) Introducing a Readability Evaluation System for Japanese Language Education. Proceedings of the 6th International Conference on Computer Assisted Systems for Teaching & Learning Japanese, Hawaii, August 2015, 19-22.
[11] Wang, S. and Andersen, E. (2016) Grammatical Templates: Improving Text Difficulty Evaluation for Language Learners. Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016), Osaka, 11-16 December 2016, 1692-1702.
[12] 王淑一, 施建军, 许堉钿, 等. 面向中国日语专业本科教学的文章难度测量研究[J]. 日语学习与研究, 2022(5): 45-56.
[13] 中町礼文, 佐藤敏紀, 西内紗恵, 等. 日本語能力試験に基づく日本語文の難易度推定[C]//言語処理学会第28回年次大会発表論文集. 京都: 言語処理学会事務局, 2022: 658-663.
[14] 黄荣怀. 信息技术与教育[M]. 北京: 北京师范大学出版社, 2002.
[15] 田臻, 彭雅靖. 人工智能背景下的计算机辅助语言学习研究进展(2011-2021) [J]. 外语界, 2022(3): 53-60.
[16] 朱晔, 王陈欣, 金慧. 智能时代计算机辅助的语言学习研究[J]. 外语教学, 2021, 42(5): 51-56.
[17] 天明教育日语能力考试研究组. 新日本语能力考试真题大全N1 [M]. 郑州: 河南大学出版社, 2021.
[18] 天明教育日语能力考试研究组. 新日本语能力考试真题大全N2 [M]. 郑州: 河南大学出版社, 2021.
[19] 天明教育日语能力考试研究组. 新日本语能力考试真题大全N3 [M]. 郑州: 河南大学出版社, 2021.
[20] 刘文照, 海老原博. 新日本语能力考试N1读解[M]. 上海: 华东理工大学出版社, 2020.
[21] 刘文照, 海老原博. 新日本语能力考试N2读解[M]. 上海: 华东理工大学出版社, 2020.
[22] 刘文照, 海老原博. 新日本语能力考试N3读解[M]. 上海: 华东理工大学出版社, 2020.
[23] 刘文照, 海老原博. 新日本语能力考试N4读解[M]. 上海: 华东理工大学出版社, 2020.
[24] 刘文照, 海老原博. 新日本语能力考试N5读解[M]. 上海: 华东理工大学出版社, 2020.
[25] 张鸿成. 新日语N4教程[M]. 上海: 上海译文出版社, 2011.
[26] 张鸿成. 新日语N5教程[M]. 上海: 上海译文出版社, 2011.
[27] 夏爱冰, 陈靖. 新日本语能力测试N4模拟冲刺[M]. 北京: 外文出版社, 2010.
[28] 刘文照, 海老原博. 非凡·新日本语能力考试·N4全真模拟试题[M]. 上海: 华东理工大学出版社, 2020.
[29] 刘文照, 海老原博. 非凡·新日本语能力考试·N5全真模拟试题[M]. 上海: 华东理工大学出版社, 2020.
[30] Devlin, J., Chang, M., Lee, K. and Toutanova, K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Minneapolis, 2-7 June 2019, 4171-4186.