基于知识增强卷积神经网络的标的物命名实体识别方法
Subject Matter Named Entity Recognition Method Based on Knowledge-Enhanced Convolutional Neural Network
摘要: 针对招标文件中,“标的物”作为命名实体存在着分词错误、多个名词并列现象导致的真实意图标的物命名实体提取困难问题,提出一种基于知识增强卷积神经网络(CNN)的标的物命名实体识别方法。该方法首先构建了针对招标文件的正则表达式,实现包含标的物短语的定位。然后利用基于知识增强卷积神经网络,在输入层将标的物定位短语和其上下文信息作为输入,通过卷积层对特征进行提取,最后通过Softmax层输出实体标注结果。在2017~2020年的19,980份招标文件的数据集上,本方法的平均准确率为0.96,与深度神经网络(DNN)、循环神经网络(RNN)和Hopfield神经网络(HNN)相比准确率分别提升了1.2%、0.4%和0.3%。实验结果表明本方法能够进一步提高标的物命名实体识别的准确率,使得企业在智能化标的物提取过程中取得更优的效果。
Abstract: In view of the difficulty in extracting the naming entity of the real meaning icon caused by word segmentation error and multiple noun juxtaposition in the bidding document, a subject matter named entity recognition method based on knowledge-enhanced Convolutional Neural Network (CNN) was proposed. Firstly, a regular expression was constructed for the bidding document to locate the phrase containing the subject matter. Then, the knowledge-enhanced convolutional neural network was used to take the target location phrase and its context information as the input in the input layer, extracting the features through the convolution layer, and finally output the entity annotation results through the Softmax layer. On the data set of 19,980 bidding documents from 2017 to 2020, the average accuracy of this method is 0.96, which is improved by 1.2%, 0.4% and 0.3% respectively compared with Deep Neural Network (DNN), Recurrent Neural Network (RNN) and Hopfield Neural Network (HNN). The experimental results show that this method can further im-prove the accuracy of object named entity recognition, and make enterprises achieve better results in the process of intelligent object extraction.
文章引用:高振祥, 江静, 陈建, 刘金硕. 基于知识增强卷积神经网络的标的物命名实体识别方法[J]. 计算机科学与应用, 2021, 11(11): 2731-2741. https://doi.org/10.12677/CSA.2021.1111277

参考文献

[1] Zhang, Y. and Yang, J. (2018) Chinese NER Using Lattice LSTM. In: Proceeding of 56th Annual Meeting of the Associ-ation for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, 1554-1564. [Google Scholar] [CrossRef
[2] Strubell, E., Verga, P., Belanger, D. and Mccallum, A. (2017) Fast and Accurate Entity Recognition with Iterated Dilated Convolutions.
https://arxiv.org/pdf/1702.02098.pdf
[3] Zhu, Y. and Wang, G. (2019) CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition. In: Pro-ceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu-man Language Technologies, Association for Computational Linguistics, Stroudsburg, 3384-3393.
[4] Cao, P., Chen, Y., Liu, K., Zhao, J. and Liu, S. (2018) Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro-cessing, Association for Computational Linguistics, Stroudsburg, 182-192. [Google Scholar] [CrossRef
[5] Peng, N. and Dredze, M. (2017) Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning.
https://arxiv.org/pdf/1603.00786v2.pdf
[6] Fei, R., Guo, J., Wang, C. and Sun, Y. (2020) Research on Chinese Electronic Medical Record Named Entity Recognition Based on Lexicon Enhancement. International Journal of Educa-tion and Teaching Research, 1, 176-182.
[7] Xu, H., Liu, H., Yang, G. and Zhang, C. (2017) Sentiment Analysis of Chinese Version Using SVM & RNN. In: Proceedings of the 6th International Conference on Information Engineering (ICIE ‘17). ACM, New York, 1-5. [Google Scholar] [CrossRef
[8] 陈曦. 基于文本信息抽取的高铁车载设备故障发现的理论与方法[D]: [硕士学位论文]. 北京: 北京交通大学, 2017: 15-19.
[9] 祖木然提古丽•库尔班. 基于神经网络的电子病历实体识别[D]: [硕士学位论文]. 乌鲁木齐: 新疆大学, 2019: 2-3.
[10] Ratinoy, L. and Roth, D. (2009) Design Challenges and Misconceptions in Named Entity Recognition. In: Proceedings of the 3th Conference on Computational Natural Language Learning, ACM, New York, 147-155. [Google Scholar] [CrossRef
[11] Dai, Z., Yang, Z., Yang, Y., et al. (2019) Transformer-XL: Atten-tive Language Models beyond a Fixed-Length Context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 2978-2988. [Google Scholar] [CrossRef
[12] Didrik, N. (2016) Tree Boosting with XGBoost-Why Does XGBoost Win “Every” Machine Learning Competition? Norwegian University of Science and Technology, Trondheim.
[13] 万小军, 冯岩松, 孙薇薇. 文本自动生成研究进展与趋势[C]//CCF2014-2015中国计算机科学技术发展报告会论文集. 北京: 机械工业出版社, 2015: 298-323.
[14] 郗亚辉. 产品评论挖掘中特征同义词的识别[J]. 中文信息学报, 2016, 30(4): 150-158.
[15] Saha, S. and Ekbal, A. (2013) Combining Multiple Classifiers Using Vote Based Classifier Ensemble Technique for Named Entity Recognition. Data & Knowledge Engineering, 85, 15-39. [Google Scholar] [CrossRef