动态词典驱动的连续命名实体识别——融合密度采样与自适应权重的双重抗遗忘机制
Dynamic Dictionary-Driven Continual Named Entity Recognition—A Dual Anti-Forgetting Mechanism Integrating Density-Aware Sampling and Adaptive Weighting
摘要: 连续命名实体识别(Continual Named Entity Recognition, CNER)作为自然语言处理领域的新兴研究方向,致力于解决模型在顺序学习新增实体类型时的核心挑战。与传统NER不同,CNER要求模型在仅访问当前阶段标注数据的情况下,持续扩展其识别能力,同时避免对已学实体类型的遗忘。然而,这一过程面临着严峻的灾难性遗忘问题——当模型学习新实体类型时,旧类型的识别性能会显著下降。这一现象在CNER任务中尤为突出,因为在增量学习过程中,历史步骤中的旧实体类型会被强制重新标注为“O”(非实体)类别。针对上述挑战,本文提出了一种融合动态词典与自适应权重调整的创新框架。该框架通过双重机制协同应对语义漂移和灾难性遗忘问题:首先,基于特征空间可视化分析,采用密度敏感的采样策略构建动态词典,在训练过程中智能补充关键样本以维持特征空间的完整性;其次,设计了一种考虑实体分布特性和历史学习表现的动态权重策略,有效平衡新旧知识的学习强度。在CoNLL2003、I2B2和OntoNotes5三个基准数据集上的实验结果表明,该方法显著提升了模型在持续学习环境下的综合性能。在涵盖十个不同增量学习任务的测试中,该方法实现了Macro-F1平均8.47%的提升,其中对低频实体的识别改进尤为显著。消融研究验证了各组件对最终性能的贡献,特征分析进一步揭示了该方法在维持特征空间稳定性方面的独特优势。
Abstract: Continual Named Entity Recognition (CNER), an emerging research direction in the field of natural language processing, is dedicated to addressing the core challenges that models face when sequentially learning new entity types. Unlike traditional NER, CNER requires models to continuously expand their recognition capabilities while only having access to the annotated data of the current stage, simultaneously avoiding forgetting previously learned entity types. However, this process is fraught with the severe problem of catastrophic forgetting—when models learn new entity types, the performance on old types significantly decreases. This phenomenon is particularly pronounced in CNER tasks because, during the incremental learning process, old entity types from historical steps are forcibly relabeled as the “O” (non-entity) category. To address the aforementioned challenges, this paper proposes an innovative framework that integrates a dynamic dictionary with adaptive weight adjustment. This framework addresses semantic drift and catastrophic forgetting through a dual mechanism: Firstly, based on feature space visualization analysis, a density-sensitive sampling strategy is employed to construct a dynamic dictionary, intelligently supplementing key samples during the training process to maintain the integrity of the feature space; Secondly, a dynamic weight strategy that considers the distribution characteristics of entities and historical learning performance is designed to effectively balance the learning intensity of new and old knowledge. Experimental results on the CoNLL2003, I2B2, and OntoNotes three benchmark datasets demonstrate that this method significantly enhances the model’s comprehensive performance in a continual learning environment. In tests covering ten different incremental learning tasks, this approach achieved an average Macro-F1 improvement of 8.47%, with particularly significant improvements in the recognition of low-frequency entities. Ablation studies confirm the contribution of each component to the final performance, and feature analysis further reveals the unique advantage of this method in maintaining the stability of the feature space.
文章引用:顾苗苗. 动态词典驱动的连续命名实体识别——融合密度采样与自适应权重的双重抗遗忘机制[J]. 建模与仿真, 2025, 14(10): 39-53. https://doi.org/10.12677/mos.2025.1410604

参考文献

[1] Ma, X.Z. and Hovy, E. (2016) End-to-End Sequence Labeling via Bi-Directional LSTM-CNNsCRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, 7-12 August 2016, 1064-1074.
[2] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. 2019 Proceedings of NAACL-HLT, Minneapolis, 2-7 June 2019, 4171-4186.
[3] Monaikul, N., Castellucci, G., Filice, S. and Rokhlenko, O. (2021) Continual Learning for Named Entity Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 13570-13577. [Google Scholar] [CrossRef
[4] Zheng, J., Liang, Z., Chen, H. and Ma, Q. (2022) Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, 7-11 December 2022, 3602-3615. [Google Scholar] [CrossRef
[5] Hinton, G., Vinyals, O., Dean, J., et al. (2015) Distilling the Knowledge in a Neural Network. arXiv: 1503.02531.
[6] Zhang, D., Cong, W., Dong, J., Yu, Y., Chen, X., Zhang, Y., et al. (2023) Continual Named Entity Recognition without Catastrophic Forgetting. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023, 8186-8197. [Google Scholar] [CrossRef
[7] Yu, Y., Zhang, D., Chen, X. and Chu, C. (2024) Flexible Weight Tuning and Weight Fusion Strategies for Continual Named Entity Recognition. Findings of the Association for Computational Linguistics ACL 2024, Bangkok, 11-16 August 2024, 1351-1358. [Google Scholar] [CrossRef
[8] Liu, H., Xin, X., Peng, W., Song, J. and Sun, J. (2025) Concept-Driven Knowledge Distillation and Pseudo Label Generation for Continual Named Entity Recognition. Expert Systems with Applications, 270, Article ID: 126546. [Google Scholar] [CrossRef
[9] Xia, Y., Wang, Q., Lyu, Y., Zhu, Y., Wu, W., Li, S., et al. (2022) Learn and Review: Enhancing Continual Named Entity Recognition via Reviewing Synthetic Samples. Findings of the Association for Computational Linguistics: ACL 2022, Dublin, 22-27 May 2022, 2291-2230. [Google Scholar] [CrossRef
[10] French, R. (1999) Catastrophic Forgetting in Connectionist Networks. Trends in Cognitive Sciences, 3, 128-135. [Google Scholar] [CrossRef] [PubMed]
[11] McCloskey, M. and Cohen, N.J. (1989) Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. In: Psychology of Learning and Motivation, Elsevier, 109-165. [Google Scholar] [CrossRef
[12] Robins, A. (1995) Catastrophic Forgetting, Rehearsal and Pseudorehearsal. Connection Science, 7, 123-146. [Google Scholar] [CrossRef
[13] Zhang, D., Li, H., Cong, W., Xu, R., Dong, J. and Chen, X. (2023) Task Relation Distillation and Prototypical Pseudo Label for Incremental Named Entity Recognition. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, 21-25 October 2023, 3319-3329. [Google Scholar] [CrossRef
[14] Chen, Z.Y. and Liu, B. (2018) Lifelong Machine Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12, 1-207.
[15] Dong, J., Wang, L., Fang, Z., Sun, G., Xu, S., Wang, X., et al. (2022) Federated Class-Incremental Learning. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 10154-10163. [Google Scholar] [CrossRef
[16] Dong, J., Zhang, D., Cong, Y., Cong, W., Ding, H. and Dai, D. (2023) Federated Incremental Semantic Segmentation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 3934-3943. [Google Scholar] [CrossRef
[17] Lopez-Paz, D. and Ranzato, M.A. (2017) Gradient Episodic Memory for Continual Learning. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6470-6479.
[18] Rebuffi, S.A., Kolesnikov, A., Sperl, G. and Lampert, C.H. (2017) iCaRL: Incremental Classifier and Representation Learning. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2001-2010.
[19] Shin, H., Lee, J.K., Kim, J. and Kim, J. (2017) Continual Learning with Deep Generative Replay. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 2994-3003.
[20] Mallya, A. and Lazebnik, S. (2018) PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7765-7773. [Google Scholar] [CrossRef
[21] Rosenfeld, A. and Tsotsos, J.K. (2020) Incremental Learning through Deep Adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 651-663. [Google Scholar] [CrossRef] [PubMed]
[22] Yoon, J., Yang, E., Lee, J. and Hwang, S.J. (2018) Lifelong Learning with Dynamically Expandable Networks. arXiv: 1708.01547.
[23] Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M. and Tuytelaars, T. (2018) Memory Aware Synapses: Learning What (not) to Forget. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer VisionECCV 2018, Springer, 144-161. [Google Scholar] [CrossRef
[24] Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., et al. (2017) Overcoming Catastrophic Forgetting in Neural Networks. Proceedings of the National Academy of Sciences of the United States of America, 114, 3521-3526. [Google Scholar] [CrossRef] [PubMed]
[25] Zenke, F., Poole, B. and Ganguli, S. (2017) Continual Learning through Synaptic Intelligence. International Conference on Machine Learning, Sydney, 6-11 August 2017, 3987-3995.
[26] Hou, S., Pan, X., Loy, C.C., Wang, Z. and Lin, D. (2019) Learning a Unified Classifier Incrementally via Rebalancing. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 831-839. [Google Scholar] [CrossRef
[27] Li, Z. and Hoiem, D. (2018) Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 2935-2947. [Google Scholar] [CrossRef] [PubMed]
[28] Dong, J., Liang, W., Cong, Y. and Sun, G. (2023) Heterogeneous Forgetting Compensation for Class-Incremental Learning. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 11708-11717. [Google Scholar] [CrossRef
[29] Li, J., Sun, A., Han, J. and Li, C. (2022) A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34, 50-70. [Google Scholar] [CrossRef
[30] Ma, R., Chen, X., Lin, Z., Zhou, X., Wang, J., Gui, T., et al. (2023) Learning “O” Helps for Learning More: Handling the Unlabeled Entity Problem for Class-Incremental NER. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, 9-14 July 2023, 5959-5979. [Google Scholar] [CrossRef
[31] Tjong, E.F., Sang, K. and De Meulder, F. (2003) Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, 31 May 2003, 142-147.
[32] Hovy, E., Marcus, M., Palmer, M., Ramshaw, L. and Weischedel, R. (2006) OntoNotes: The 90% Solution. Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XXNAACL’06, New York, 4-9 June 2006, 57-60. [Google Scholar] [CrossRef