基于多模融合的Java领域命名实体识别
Named Entity Recognition in Java Domain Based on Multi-Mode Fusion
DOI: 10.12677/CSA.2022.1212275, PDF,  被引量    科研立项经费支持
作者: 李凯微:沈阳建筑大学,辽宁 沈阳;王佳英, 单 菁:沈阳建筑大学,辽宁 沈阳;沈阳工业大学,辽宁 沈阳
关键词: 命名实体识别多模融合实体边界BiLSTMCRFNamed Entity Recognition Multimode Fusion Solid Boundary BiLSTM CRF
摘要: 命名实体识别是构建学科知识图谱的重要步骤。近年来,随着深度学习的发展,通用领域、医学等领域命名实体识别的性能得到了很大的提升。Java学科领域知识点繁杂,实体中英文掺杂,并且存在其特有的实体内部特征,因此通用模型在此领域实体识别准确率并不高、不能有效识别实体边界。提出改进的单模结构,在嵌入层融入词边界信息,引入了词性信息和Java领域实体识别的规则信息,以提高模型识别实体边界的准确率。编码层使用BiLSTM和IDCNN进行上下文信息提取,解码层使用CRF得到序列全局最优提取。其次,提出对多个异构单模结果进行融合互补的想法,以提高模型实体识别性能和模型的泛化能力。实验结果显示,基于自主构建的Java领域数据集,新的单模模型相比于主流模型实体识别F1值提高了约2个百分点。多模融合后的实体识别的性能也有明显的提升,表明模型在Java领域命名实体识别任务上有着更好的效果。
Abstract: Named entity recognition is an important step in constructing disciplinary knowledge map. In recent years, with the development of deep learning, the performance of named entity recognition in general field, medicine and other fields has been greatly improved. The knowledge of Java subject is complicated, the entities are mixed in Chinese and English, and there are unique internal characteristics of the entities. Therefore, the accuracy of entity recognition of the general model in this field is not high, and the entity boundary cannot be effectively identified. In order to improve the accuracy of entity boundary recognition, an improved single-mode structure is proposed, and word boundary information is incorporated into the embedding layer, part of speech information and Java domain entity recognition rule information are introduced. BiLSTM and IDCNN are used in encoding layer to extract context information, and CRF is used in decoding layer to obtain global optimal sequence extraction. Secondly, the idea of fusing and complementing multiple heterogeneous single-mode results is proposed to improve the entity recognition performance and generalization capability of the model. Experimental results show that, based on the self-constructed Java domain data set, the entity recognition F1 value of the new single-mode model is improved by about 2 percentage points compared with the mainstream model. The performance of entity recognition after multi-mode fusion is also significantly improved, indicating that the model has better performance in Java domain named entity recognition task.
文章引用:李凯微, 王佳英, 单菁. 基于多模融合的Java领域命名实体识别[J]. 计算机科学与应用, 2022, 12(12): 2712-2724. https://doi.org/10.12677/CSA.2022.1212275

参考文献

[1] 余蕾. 互联网背景下教学模式探究[J]. 当代教育实践与教学研究, 2019(23): 8-9.
[2] 李艳茹, 周子力, 倪睿康, 等. 基于知识图谱的学科知识构建[J]. 计算机时代, 2021(4): 65-68.
[3] 赵山, 罗睿, 蔡志平. 中文命名实体识别综述[J]. 计算机科学与探索, 2022, 16(2): 296-304.
[4] 邓依依, 邬昌兴, 魏永丰, 等. 基于深度学习的命名实体识别综述[J]. 中文信息学报, 2021, 35(9): 30-45.
[5] 姚雅峰. Java技术的发展趋势与应用研究[J]. 无线互联科技, 2021, 18(6): 81-82.
[6] Li, Y., Chiticariu, L., Reiss, F., et al. (2010) Domain Adaptation of Rule-Based Anno-tators for Named-Entity Recognition Tasks.
[7] Morwal, S. (2012) Named Entity Recognition using Hidden Markov Model (HMM). International Journal on Natural Language Computing, 1, 15-23. [Google Scholar] [CrossRef
[8] Song, S., Nan, Z. and Huang, H. (2017) Named Entity Recognition Based on Conditional Random Fields. Cluster Computing, 22, 5195-5206. [Google Scholar] [CrossRef
[9] Ju, Z., Wang, J. and Zhu, F. (2011) Named Entity Recognition from Biomedical Text Using SVM. 2011 5th International Conference on Bioinformatics and Biomedical Engineering, Wuhan, 10-12 May 2011, 1-4. [Google Scholar] [CrossRef
[10] 何玉洁, 杜方, 史英杰, 宋丽娟. 基于深度学习的命名实体识别研究综述[J]. 计算机工程与应用, 2021, 57(11): 21-36.
[11] Marrero, M., Urbano, J., Sánchez-Cuadrado, S., et al. (2013) Named Entity Recognition: Fallacies, Challenges and Opportunities. Computer Standards & Interfaces, 35, 482-489. [Google Scholar] [CrossRef
[12] Hochreiter, S., et al. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[13] Hammerton, J. (2003) Named Entity Recognition with Long Short-Term Memory. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, 31 May-1 June 2003, 172-175. [Google Scholar] [CrossRef
[14] Bahdanau, D., Cho, K. and Bengio, Y. (2014) Neural Machine Translation by Jointly Learning to Align and Translate. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, 7-9 May 2015.
[15] Peng, N. and Dredze, M. (2016) Improving Named Entity Recognition for Chi-nese Social Media with Word Segmentation Representation Learning. Proceedings of the 54th Annual Meeting of the As-sociation for Computational Linguistics, Volume 2, 149-155. [Google Scholar] [CrossRef
[16] Strubell, E., Verga, P., Belanger, D., et al. (2017) Fast and Accurate Entity Recognition with Iterated Dilated Convolutions. Pro-ceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, September 2017, 2670-2680. [Google Scholar] [CrossRef
[17] Zhang, Y. and Yang, J. (2018) Chinese NER Using Lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1, 1554-1564. [Google Scholar] [CrossRef
[18] Devlin, J., Chang, M.W., Lee, K., et al. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
[19] Cui, Y., Che, W., Liu, T., et al. (2020) Revisiting Pre-Trained Models for Chinese Natural Language Processing. Findings of the Association for Computational Linguis-tics: EMNLP, November 2020, 657-668. [Google Scholar] [CrossRef
[20] Li, X., Zhang, H. and Zhou, X.H. (2020) Chinese Clinical Named Entity Recognition with Variant Neural Structures Based on BERT Methods. Journal of Biomedical Informatics, 107, Article ID: 103422. [Google Scholar] [CrossRef] [PubMed]
[21] Wang, C., Li, B. and Zhang, W. (2020) Attention-BLSTM-CRF Based Method for Named Entity Recognition in Judicial Domain. Journal of Physics Conference Series, 1616, Article ID: 012108. [Google Scholar] [CrossRef
[22] Liu, S., Yang, H., Li, J., et al. (2021) Chinese Named En-tity Recognition Method in History and Culture Field Based on BERT.
[23] 王佳楠, 梁永全. 中文分词研究综述[J]. 软件导刊, 2021, 20(4): 247-252.
[24] Graves, A. and Schmidhuber, J. (2005) Framewise Phoneme Classification with Bidirectional LSTM Networks. IEEE International Joint Conference on Neural Networks, Vol. 4, 2047-2052. [Google Scholar] [CrossRef] [PubMed]
[25] Konkol, M. and Konopík, M. (2013) CRF-Based Czech Named Entity Recognizer and Consolidation of Czech NER Research. Springer, Berlin. [Google Scholar] [CrossRef
[26] 周玉新. 命名实体识别研究发展综述[J]. 科技风, 2016(16): 99.