基于交叉熵与信息熵约束的QANet模型优化及其在机器阅读理解中的应用
Optimization of QANet Model Based on Cross-Entropy and Information Entropy Constraints and Its Application to Machine Reading Comprehension
摘要: 机器阅读理解(MRC)作为自然语言处理(NLP)领域的重要任务,旨在使机器能够准确理解人类语言并回答相关问题。本文聚焦于QANet模型的优化研究,该模型融合了卷积神经网络(CNN)和自注意力机制,以实现对文本的精准理解和答案定位。传统QANet模型依赖单一交叉熵损失函数进行训练,可能导致答案分布不确定性增加。为此,本文提出一种结合交叉熵与信息熵的混合损失函数,通过引入信息熵约束项,在提升模型准确性的同时增强概率分布的置信度。实验结果表明,改进后的模型在SQuAD数据集上的F1得分和精确匹配(EM)均有所提升,为MRC任务中的损失函数设计提供了新的优化方向。
Abstract: Machine Reading Comprehension (MRC), as a crucial task in the Natural Language Processing (NLP) field, aims to enable machines to accurately understand human language and answer related questions. This paper focuses on the optimization study of the QANet model, which integrates Convolutional Neural Networks (CNN) and self-attention mechanisms to achieve precise text comprehension and answer localization. Traditional QANet models rely solely on a single cross-entropy loss function for training, which may increase uncertainty in answer distribution. To address this, we propose a hybrid loss function combining cross-entropy and information entropy. By introducing an information entropy constraint term, this approach enhances model accuracy while strengthening the confidence of probability distributions. Experimental results demonstrate that the improved model achieves higher F1 scores and Exact Match (EM) metrics on the SQuAD dataset, providing a new optimization direction for loss function design in MRC tasks.
文章引用:陈志松, 刘军, 唐悦, 唐树江. 基于交叉熵与信息熵约束的QANet模型优化及其在机器阅读理解中的应用[J]. 数据挖掘, 2025, 15(3): 262-270. https://doi.org/10.12677/hjdm.2025.153022

参考文献

[1] Ashish, V., Noam, S., Niki, P., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, 4-9 December 2017, 5999-6009.
[2] Yenduri, G., Ramalingam, M., Selvi, G.C., Supriya, Y., Srivastava, G., Maddikunta, P.K.R., et al. (2024) GPT (Generative Pre-Trained Transformer)—A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. IEEE Access, 12, 54608-54649. [Google Scholar] [CrossRef
[3] Devlin, J., Chang, M.W., Kenton, L., et al. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186.
[4] Wojciech, Z., Sutskever, I. and Vinyals, O. (2014) Recurrent Neural Network Regularization. Cornell University, arxiv.
[5] Bakke, V.C., Kjærran, A. and Stray, B.E. (2021) Long Short-Term Memory RNN. Cornell University, arxiv.
[6] Fardin, S., Riccardo, D.S. and Sinervo, P.K. (2019) Bidirectional Long Short-Term Memory (BLSTM) Neural Networks for Reconstruction of Top-Quark Pair Decay Kinematics. Cornell University, arxiv.
[7] Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016) Squad: 100,000+ Questions for Machine Comprehension of Text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 1-5 November 2016, 2383-2392. [Google Scholar] [CrossRef
[8] Yu, A.W., Dohan, D., Luong, M.T., et al. (2018) QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. International Conference on Learning Representations. arXiv preprint arXiv:1804.09541.
[9] Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. [Google Scholar] [CrossRef