面向英语阅读测试的采用摘要和句法技术的首个提问生成方法
First Question Generation Method via Summarization and Syntax for English Reading Tests
DOI: 10.12677/csa.2024.148178, PDF,    国家自然科学基金支持
作者: 张 懿, 曾国荪*:同济大学计算机科学与技术系,上海;国家高性能计算机工程技术中心同济分中心,上海
关键词: 问答系统英语阅读测试文本摘要句法分析提问生成Question Answering System English Reading Tests Text Summarization Syntactic Analysis Question Generation
摘要: 在中小学英语教学中,阅读理解测试的首个提问是至关重要的,以其作为考察切入点能够引导后续问题的提出,以便检验学生理解能力和逻辑思维能力。现有的提问生成方法产生的问句内容发散,难以达到教学考察目的,并且严重依赖基础知识库和计算资源。为此,提出采用摘要和句法技术,以低算力高效率的方式实现首个提问生成。该方法从英语阅读正文中,选择核心摘要句以便聚焦正文主旨,并通过转译规则改写摘要语句的表述方式。对于转译后的核心摘要句,分析其主谓宾成分信息,确定关注对象,匹配相应的疑问词,进而根据语法规则生成首个提问。实验表明,提出的方法的语义相似度Bert Score和精确率BLRU-4分别达到了67.15和16.07,在句法、语义和可回答性方面均优于基线方法,能够有效生成适配于英语教学场景的首次提问。
Abstract: In basic English teaching, the first question in reading tests is crucial, which acts as an examination entry point leading to subsequent questions and testing students’ comprehension and logical thinking skills. Existing methods generate questions that are divergent in content, hardly meaningful for pedagogical investigation, and heavily dependent on knowledge base and computational resources. Therefore, this paper proposed a method using summarization and syntax to achieve first question generation in a low-computing and high-efficient manner. The method selected key summary sentence from the English reading text to focus on the main idea of the text, and changed its presentation by applying translation rules. For the paraphrased key summary sentence, its subject-predicate-object components was analyzed to determine the subject of concern, matched the corresponding question words, and then generated the first question according to the grammatical rules. Experimental results show that the semantic similarity Bert Score and accuracy BLRU-4 of proposed method reach 67.15 and 16.07, respectively, which outperforms the baseline method in terms of syntax, semantics, and answerability. Experiments also demonstrate that the proposed method can effectively generate first questions adapted to English teaching scenarios.
文章引用:张懿, 曾国荪. 面向英语阅读测试的采用摘要和句法技术的首个提问生成方法[J]. 计算机科学与应用, 2024, 14(8): 207-220. https://doi.org/10.12677/csa.2024.148178

参考文献

[1] Lewis, P., Denoyer, L., Riedel, S. (2019) Unsupervised Question Answering by Cloze Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 4896-4910. [Google Scholar] [CrossRef
[2] 李伟, 黄贤英, 冯雅茹. 基于课程学习的无监督常识问答模型[J]. 计算机应用研究, 2023, 40(6): 1674-1678, 1685.
[3] Fabbri, A. R., Ng, P., Wang, Z., et al. (2020) Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, 5-10 July 2020, 4508-4513. [Google Scholar] [CrossRef
[4] Li, Z., Wang, W., Dong, L., et al. (2020) Harvesting and Refining Question-Answer Pairs for Unsupervised QA. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, 5-10 July 2020, 6719-6728. [Google Scholar] [CrossRef
[5] Nagumothu, D., Ofoghi, B., Huang, G., et al. (2022) PIE-QG: Paraphrased Information Extraction for Unsupervised Question Generation from Small Corpora. Proceedings of the 26th Conference on Computational Natural Language Learning, Abu Dhabi, 7-8 December 2022, 350-359. [Google Scholar] [CrossRef
[6] Zhou, Q., Yang, N., Wei, F., et al. (2017) Neural Question Generation from Text: A Preliminary Study. Natural Language Processing and Chinese Computing: 6th CCF International Conference, Dalian, 8-12 November 2017, 662-671. [Google Scholar] [CrossRef
[7] Yao, K., Zhang, L., Luo, T., et al. (2018) Teaching Machines to Ask Questions. Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 13-19 July 2018, 4546-4552. [Google Scholar] [CrossRef
[8] Chen, Y., Wu, L. and Zaki, M.J. (2019) Natural Question Generation with Reinforcement Learning Based Graph-to-Sequence Model. arXiv: 1910.08832. [Google Scholar] [CrossRef
[9] Wang, L., Xu, Z., Lin, Z., et al. (2020) Answer-Driven Deep Question Generation Based on Reinforcement Learning. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, 8-13 December 2020, 5159-5170. [Google Scholar] [CrossRef
[10] Jia, X., Zhou, W., Sun, X., et al. (2020) How to Ask Good Questions? Try to Leverage Paraphrases. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, 5-10 July 2020, 6130-6140. [Google Scholar] [CrossRef
[11] Ma, X., Zhu, Q., Zhou, Y., et al. (2020) Improving Question Generation with Sentence-Level Semantic Matching and Answer Position Inferring. Proceedings of the AAAI Conference on Artificial Intelligence, New York, 7-14 February 2020, 8464-8471. [Google Scholar] [CrossRef
[12] Wang, T., Yuan, X. and Trischler, A. (2017) A Joint Model for Question Answering and Question Generation. arXiv: 1706.01450. [Google Scholar] [CrossRef
[13] Sachan, M. and Xing, E. (2018) Self-Training for Jointly Learning to Ask and Answer Questions. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, New Orleans, 2-4 June 2018, 629-640. [Google Scholar] [CrossRef
[14] Bulathwela, S., Muse, H. and Yilmaz, E. (2023) Scalable Educational Question Generation with Pre-Trained Language Models. Proceedings of the 24th International Conference on Artificial Intelligence in Education, Tokyo, 3-7 July 2023, 327-339. [Google Scholar] [CrossRef
[15] 王培冰, 张宁, 张春. 基于Prompt的两阶段澄清问题生成方法[J]. 计算机应用研究, 2024, 41(2): 421-425.
[16] Lee, U., Jung, H., Jeon, Y., et al. (2023) Few-Shot Is Enough: Exploring ChatGPT Prompt Engineering Method for Automatic Question Generation in English Education. Education and Information Technologies, 5, 1-33.
[17] Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv: 1301.3781. [Google Scholar] [CrossRef
[18] Chinchor, N. and Robinson, P. (1997) MUC-7 Named Entity Task Definition. Proceedings of the 7th Conference on Message Understanding, Fairfax, 29 April 1997, 1-21.
[19] Mihalcea, R. and Tarau, P. (2004) TextRank: Bringing Order into Text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, 25-26 July 2004, 404-411.
[20] Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30, 107-117. [Google Scholar] [CrossRef
[21] Kolluru, K., Adlakha, V., Aggarwal, S., et al. (2020) OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online, 16-20 November 2020: 3748-3761. [Google Scholar] [CrossRef
[22] Nallapati, R., Zhou, B., Gulcehre, C., et al. (2016) Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. Proceedings of the 20th Conference on Computational Natural Language Learning, Berlin, 11-12 August 2016, 280-290. [Google Scholar] [CrossRef
[23] Papineni, K., Roukos, S., Ward, T., et al. (2002) Bleu: A Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, 7-12 July 2002, 311-318. [Google Scholar] [CrossRef
[24] Lin, C.Y. (2004) ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, Barcelona, July 2004, 74-81.
[25] Zhang, T., Kishore, V., Wu, F., et al. (2019) BERTScore: Evaluating Text Generation with BERT. arXiv: 1904.09675. [Google Scholar] [CrossRef
[26] Nema, P. and Khapra, M.M. (2018) Towards a Better Metric for Evaluating Question Generation Systems. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, 31 October-4 November 2018, 3950-3959. [Google Scholar] [CrossRef