面向多版本法规问答的版本感知知识图谱增强检索方法
Version-Aware Knowledge-Graph-Enhanced Retrieval for Multi-Version Legal Question Answering
DOI: 10.12677/csa.2026.166218, PDF,   
作者: 陈晓宇*, 花 蕾:上海市数字城市规划研究中心,上海;上海市量子城市空间智能创新重点实验室,上海
关键词: 知识图谱检索增强生成法律问答多版本检索本地大模型推理Knowledge Graph Retrieval-Augmented Generation Legal Question Answering Multi-Version Retrieval Local LLM Inference
摘要: 法律法规在多次修订后,同一条款编号在不同版本中内容存在差异,且条款之间交叉引用频繁。现有基于BM25和稠密向量的检索增强生成(RAG)系统在两类问题上表现不佳:其一为时效性问题(即给定日期下应当适用条款的哪个版本),其二为版本差异问题(即条款在不同版本之间发生了哪些变化)。本文分析上述不佳的根本原因,提出一套有针对性的解决方案。首先,构建跨版本法律知识图谱,包含条款引用(references)、版本修订(amends)和未变延续(inherits)三类边。其次,将问题划分为精确、版本差异、推理和时效四类,每类配置固定的跳数和边类型策略。当问题中出现具体条款编号时,跳过稠密种子检索,按article_no_int字段直接执行SQL查询以获取同条款的多版本内容,从而规避稠密检索因抽象关键词导致的语义偏移。本文在《中华人民共和国城乡规划法》四个版本(1989、2007、2015、2019,共256条)上构建67题基准开展评估。实验结果显示,本方法Recall@5达到0.963, MRR达到0.934,引用F1达到0.897,均高于BM25(0.761/0.692/0.718)和稠密检索(0.918/0.872/0.867)。在版本差异类别上,两类基线方法的Recall@5仅为0.167和0.500,本方法达到1.000,引用F1由0.133和0.500提升至0.933。在时效性类别上,本方法与稠密检索持平于Recall@5 = 1.000和引用F1 = 0.972。为验证泛化性,本文在《测绘法》(32题,修订密集)和《土地权属争议调查处理办法》(15题,修订稀疏)两部法律上补充实验。前者四项指标均达到1.000,后者退化为稠密检索水平但不劣化。双侧Wilcoxon配对检验显示,本方法相对BM25在六项指标上均p < 0.01(效应量r ∈ [0.659, 0.883])。本文实验环境完全可复现:嵌入采用本地Qwen3-Embedding-4B,生成采用Gemma-4-31B。
Abstract: Legal statutes are amended repeatedly. Across versions the same article number can carry different text and cross-references. BM25 and dense RAG both fail on two practical question types in this setting: temporal applicability (i.e., which version of an article should be applied on a given date) and version differences (i.e., how an article has changed across different versions). This paper analyzes the root causes of these failures and proposes a targeted solution. First, we build a cross-version legal knowledge graph containing references, amends, and inherits edges. Second, we classify questions into four types (precise, version difference, reasoning, and temporal applicability) and configure fixed hop and edge type policies for each. Finally, when an explicit article number appears in the question, we bypass dense seeding retrieval and directly execute a SQL query on the article_no_int field to obtain multi-version contents of the same article, thereby circumventing the semantic shift in dense retrieval caused by abstract keywords. On the Chinese Urban-Rural Planning Law (4 versions, 256 articles, 67-question benchmark), our method reaches Recall@5 = 0.963, MRR = 0.934, Hit@1 = 0.896, Citation F1 = 0.897, against 0.761/0.692/0.612/0.718 for BM25 and 0.918/0.872/0.821/0.867 for dense retrieval. On version-diff queries, the Recall@5 of the two baselines is only 0.167 and 0.500, while our method reaches 1.000, and the citation F1 is improved from 0.133 and 0.500 to 0.933. On the temporal applicability category, our method ties with dense retrieval at Recall@5 = 1.000 and citation F1 = 0.972. To verify generalization, we conduct supplementary experiments on two additional laws: the Surveying and Mapping Law (32 questions, densely amended) and the Measures for the Investigation and Handling of Land Ownership Disputes (15 questions, sparsely amended). The former achieves 1.000 on all four metrics, while the latter regresses to the dense retrieval level without degradation. Two-sided Wilcoxon paired tests show that our method outperforms BM25 on all six metrics with p < 0.01 (effect size r in [0.659, 0.883]). The experimental setup is fully reproducible, using local Qwen3-Embedding-4B (MLX backend) for retrieval and Gemma-4-31B (Google AI Studio free tier) for generation.
文章引用:陈晓宇, 花蕾. 面向多版本法规问答的版本感知知识图谱增强检索方法[J]. 计算机科学与应用, 2026, 16(6): 177-190. https://doi.org/10.12677/csa.2026.166218

参考文献

[1] Lewis, P., Perez, E., Piktus, A., et al. (2020) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, 6-12 December 2020, 9459-9474.
[2] Jung, J., Yoon, T. and Cho, H. (2026) CALRK-Bench: Evaluating Context-Aware Legal Reasoning in Korean Law. arXiv:2603.26332.
[3] Guha, N., Nyarko, J., Ho, D.E., Ré, C., Chilton, A., Narayana, A., et al. (2023) Legalbench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. SSRN Electronic Journal, 143 p. [Google Scholar] [CrossRef
[4] Chalkidis, I., Jana, A., Hartung, D., Bommarito, M.J., Androutsopoulos, I., Katz, D.M., et al. (2021) LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. SSRN Electronic Journal, 17 p. [Google Scholar] [CrossRef
[5] Goebel, R., Kano, Y., Kim, M., Rabelo, J., Satoh, K. and Yoshioka, M. (2023) Summary of the Competition on Legal Information, Extraction/Entailment (COLIEE) 2023. Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, Braga, 19-23 June 2023, 472-480. [Google Scholar] [CrossRef
[6] Xiao, C., Zhong, H., Guo, Z., et al. (2018) CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction. arXiv:1807.02478.
[7] Duan, X., Wang, B., Wang, Z., Ma, W., Cui, Y., Wu, D., et al. (2019) CJRC: A Reliable Human-Annotated Benchmark Dataset for Chinese Judicial Reading Comprehension. In: Sun, M., Huang, X., Ji, H., Liu, Z. and Liu, Y., Eds., Lecture Notes in Computer Science, Springer International Publishing, 439-451. [Google Scholar] [CrossRef
[8] Fei, Z., Shen, X., Zhu, D., Zhou, F., Han, Z., Huang, A., et al. (2024) Lawbench: Benchmarking Legal Knowledge of Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, 12-16 November 2024, 7933-7962. [Google Scholar] [CrossRef
[9] Huang, Q., Tao, M., An, Z., et al. (2023) Lawyer LLaMA Technical Report. arXiv:2305.15062.
[10] Cui, J., Li, Z., Yan, Y., et al. (2023) ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases. arXiv:2306.16092.
[11] Yue, S., Chen, W., Wang, S., et al. (2023) DISC-LawLLM: Fine-Tuning Large Language Models for Intelligent Legal Services. arXiv:2309.11325.
[12] Louis, A., Van Dijck, G. and Spanakis, G. (2024) Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 22266-22275. [Google Scholar] [CrossRef
[13] Bernsohn, D., Semo, G., Vazana, Y., Hayat, G., Hagag, B., Niklaus, J., et al. (2024) LegalLens: Leveraging LLMs for Legal Violation Identification in Unstructured Text. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), St. Julian’s, 17-22 March 2024, 2129-2145. [Google Scholar] [CrossRef
[14] Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., et al. (2020) Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16-20 November 2020, 6769-6781. [Google Scholar] [CrossRef
[15] Izacard, G., Caron, M., Hosseini, L., et al. (2022) Unsupervised Dense Information Retrieval with Contrastive Learning. arXiv:2112.09118.
[16] Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D. and Liu, Z. (2024) M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings through Self-Knowledge Distillation. Findings of the Association for Computational Linguistics ACL 2024, Bangkok, 11-16 August 2024, 2318-2335. [Google Scholar] [CrossRef
[17] Zhang, Y., Li, M., Long, D., et al. (2025) Qwen3 Embedding: Advancing Text Embedding and Reranking through Foundation Models. arXiv:2506.05176.
[18] Gao, L., Ma, X., Lin, J. and Callan, J. (2023) Precise Zero-Shot Dense Retrieval without Relevance Labels. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, 9-14 July 2023, 1762-1777. [Google Scholar] [CrossRef
[19] Wang, L., Yang, N. and Wei, F. (2023) Query2doc: Query Expansion with Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023, 9414-9423. [Google Scholar] [CrossRef
[20] Raudaschl, A.H. (2024) RAG-Fusion: A New Take on Retrieval-Augmented Generation. arXiv:2402.03367.
[21] Edge, D., Trinh, H., Cheng, N., et al. (2024) From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130.
[22] Baek, J., Aji, A.F. and Saffari, A. (2023) Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering. Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE), Toronto, 13 June 2023, 78-106. [Google Scholar] [CrossRef
[23] Sen, P., Mavadia, S. and Saffari, A. (2023) Knowledge Graph-Augmented Language Models for Complex Question Answering. Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE), Toronto, 13 June 2023, 1-8. [Google Scholar] [CrossRef
[24] Sun, J., Xu, C., Tang, L., et al. (2024) Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. arXiv:2307.07697.
[25] Goel, R., Kumar, S.P., Agrawal, A., Poddar, D., Narang, P. and Kumar, D. (2025) Domain-Partitioned Hybrid RAG for Legal Reasoning: Toward Modular and Explainable Legal AI for India. arXiv:2602.23371.
[26] Chae, K., Yeom, J., Park, J., Bae, S., Jang, I., Jin, H., et al. (2026) Beyond Case Law: Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA. arXiv:2604.06173.
[27] Khattab, O. and Zaharia, M. (2020) ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25-30 July 2020, 39-48. [Google Scholar] [CrossRef
[28] Nogueira, R., Jiang, Z., Pradeep, R. and Lin, J. (2020) Document Ranking with a Pretrained Sequence-to-Sequence Model. Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16-20 November 2020, 708-718. [Google Scholar] [CrossRef
[29] Sun, W., Yan, L., Ma, X., Wang, S., Ren, P., Chen, Z., et al. (2023) Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023, 14918-14937. [Google Scholar] [CrossRef
[30] Alonso, O., Strotgen, J., Baeza-Yates, R. and Gertz, M. (2011) Temporal Information Retrieval: Challenges and Opportunities. TWAW.
https://ceur-ws.org/Vol-707/TWAW2011-paper1.pdf
[31] Kanhabua, N., Blanco, R. and NɈrvȩg, K. (2015) Temporal Information Retrieval. Foundations and Trends® in Information Retrieval, 9, 91-208. [Google Scholar] [CrossRef
[32] Palmirani, M. and Vitali, F. (2011) Akoma-Ntoso for Legal Documents. In: Sartor, G., Palmirani, M., Francesconi, E. and Biasiotti, M., Eds., Legislative XML for the Semantic Web, Springer, 75-100. [Google Scholar] [CrossRef
[33] Athan, T., Governatori, G., Palmirani, M., Paschke, A. and Wyner, A. (2015) LegalRuleML: Design Principles and Foundations. In: Faber, W. and Paschke, A. Eds., Lecture Notes in Computer Science, Springer International Publishing, 151-188. [Google Scholar] [CrossRef
[34] Publications Office of the European Union (2026) EUR-Lex: CELEX Numbering System.
https://eur-lex.europa.eu/
[35] Fowler, J.H. and Jeon, S. (2008) The Authority of Supreme Court Precedent. Social Networks, 30, 16-30. [Google Scholar] [CrossRef
[36] Sadeghian, A., Sundaram, L., Wang, D.Z., Hamilton, W.F., Branting, K. and Pfeifer, C. (2018) Automatic Semantic Edge Labeling over Legal Citation Graphs. Artificial Intelligence and Law, 26, 127-144. [Google Scholar] [CrossRef
[37] Leblay, J. and Chekol, M.W. (2018) Deriving Validity Time in Knowledge Graph. Companion Proceedings of the The Web Conference 2018, Lyon, 31 23-27 April 2018, 1771-1776. [Google Scholar] [CrossRef
[38] García-Durán, A., Dumančić, S. and Niepert, M. (2018) Learning Sequence Encoders for Temporal Knowledge Graph Completion. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, 31 October-4 November 2018, 4816-4821. [Google Scholar] [CrossRef
[39] Kanapala, A., Pal, S. and Pamula, R. (2017) Text Summarization from Legal Documents: A Survey. Artificial Intelligence Review, 51, 371-402. [Google Scholar] [CrossRef