基于知识图谱的档案领域问答系统研究与应用
Research and Application of Archive Domain Question Answering System Based on Knowledge Graph
DOI: 10.12677/sea.2024.132020, PDF,   
作者: 王建林, 陈萌萌, 冶存花, 魏天楠:西北民族大学数学与计算机科学学院,甘肃 兰州
关键词: 知识图谱档案Neo4jFlaskKnowledge Graph Archives Neo4j Flask
摘要: 在信息化时代的迅速发展下,每天都会产生大量的文书档案数据。然而,当前这些数据的利用率并不高,用户的检索效率也较低。为了改善这一状况,提出了一种基于知识图谱的自动问答系统。首先,利用自然语义处理技术(Stanford NLP)对责任者进行实体识别和关系抽取,以丰富档案知识图谱。通过这项技术,能够从文档中识别特定的实体,如人名、地点、组织机构等,并了解它们之间的关系,从而构建一个丰富的知识图谱。为了存储和管理这些信息,选择了Neo4j作为数据库。Neo4j是一个图数据库,非常适合存储和查询具有复杂关系的数据,这与知识图谱非常契合。其次,设计并实现了一个基于知识图谱的问答系统,其核心功能在于通过模板问答方法进行信息检索。通过结合自然语义处理技术和知识图谱,问答系统能够理解用户提出的问题,并在知识图谱中进行关联查询,以找到准确的答案。为了让用户更容易操作,还使用Flask框架开发了一个轻量级且易于使用的Web界面。通过整合知识图谱,问答系统可以利用自然语言处理技术为用户提供准确、快速且个性化的答案和服务。这不仅提高了文书档案数据的利用率,也提升了用户的检索效率。
Abstract: In the rapidly developing era of information technology, a large amount of documentary archive data is generated every day. However, the utilization rate of this data is currently not high, and the efficiency of user retrieval is also low. In order to improve this situation, a knowledge graph-based automatic question answering system is proposed. Firstly, natural language processing technology (Stanford NLP) is used to identify entities and extract relationships among responsible parties, enriching our archive knowledge graph. This technology helps identify specific entities such as names, locations, and organizations from documents and understand the relationships between them, thereby building a rich knowledge graph.Neo4j was chosen as the database to store and manage this information. Neo4j is a graph database that is well-suited for storing and querying data with complex relationships, making it a perfect fit for our knowledge graph. Secondly, a question answering system based on the knowledge graph is designed and implemented, with its core function being information retrieval through template-based question answering. By combining natural language processing technology and the knowledge graph, our question answering system can understand user queries and perform related queries in the knowledge graph to find accurate answers. To make it easier for users to operate, a lightweight and user-friendly web interface was developed using the Flask framework. By integrating the knowledge graph, our question answering system can provide users with accurate, fast, and personalized answers and services using natural language processing technology. This not only improves the utilization rate of documentary archive data but also enhances user retrieval efficiency.
文章引用:王建林, 陈萌萌, 冶存花, 魏天楠. 基于知识图谱的档案领域问答系统研究与应用[J]. 软件工程与应用, 2024, 13(2): 190-198. https://doi.org/10.12677/sea.2024.132020

参考文献

[1] 郭雪薇, 董晶. 基于特征关联分析的档案信息关联模型[J]. 电子设计工程, 2019, 27(1): 47-52.
[2] 雷洁, 李思经, 赵瑞雪, 等. 面向科研档案管理的知识图谱构建与应用研究[J]. 数字图书馆论坛, 2020(5): 8-15.
[3] 王电化, 钱涛, 钱立新, 等. 面向档案的知识图谱构建方法研究[J]. 湖北科技学院学报, 2020, 40(1): 127-130.
[4] 周程, 戴贵奇, 周卓畅, 等. 基于知识图谱的数字档案服务模式探究[J]. 兰台内外, 2023(26): 1-3.
[5] 杨茜雅. 中国联通电子档案数据挖掘与智能利用的研究[J]. 档案学研究, 2018(6): 105-109.
[6] 雷洁, 赵瑞雪, 李思经, 等. 知识图谱驱动的科研档案大数据管理系统构建研究[J]. 数字图书馆论坛, 2020(2): 19-27.
[7] 舒忠梅. 数字人文背景下的档案知识图谱构建研究[J]. 山西档案, 2020(2): 53-60.
[8] Balaji, B.S., Karthikeyan, N.K. and Kumar, R. (2018) Fuzzy Service Conceptual Ontology System for Cloud Service Recommendation. Computers & Electrical Engineering, 69, 435-446. [Google Scholar] [CrossRef
[9] 张巍, 陈俊杰. 信息熵方法及在中文问题分类中的应用[J]. 计算机工程与应用, 2013, 49(10): 129-131, 179.