基于主题词的文本案例检索算法研究
Algorithm Optimization about Textual Case Retrieval Based on Topic Words
DOI: 10.12677/CSA.2013.38062, PDF, HTML, XML, 下载: 2,660  浏览: 4,916 
作者: 孙 镇:北京大学,北京;全国组织机构代码管理中心,北京;袁 辉, 孙 泰, 宫 政, 赵 捷:全国组织机构代码管理中心,北京;汤 磊:中国测绘科学研究院,北京
关键词: 布尔检索主题词语义距离改进检索算法查准率查全率Boolean Retrieval; Topic Words; Semantic Distance; Improved Algorithm; Precision Rate; Recall Rate
摘要: 分析传统文本检索方法布尔检索的本质,发现该检索方法存在两个缺点:检索算法忽略了词语之间的语义关系以及不能对检索结果进行重要性排序,针对于此提出利用基于主题词的改进检索算法。通过丰富主题词构建关键词库语义信息检索框架的基础上,计算关键词的语义距离和相似度。最后将改进后的算法应用到灾情案例检索系统中,并对检索结果做性能分析,实验证明该算法在文本检索的查准率和查全率上都有较好的改善。
Abstract: Two shortages of Boolean retrieval, ignoring the semantic relations between words and unable to rank the retrieval results in order of importance, were found by analyzing the essence of traditional text retrieval, and in view of which, an improvement of algorithm optimization based on topic words was proposed. Through enriching topic words to structure keywords library, the semantic distance and similarity of keywords were calculated on the basis of semantic retrieval framework. The improved algorithm was applied in the military case retrieval system at last, and then retrieval results were analyzed to detect performance. It is observed that the improved algorithm has a better improvement in both precision rate and recall rate of retrieval.
文章引用:孙镇, 袁辉, 孙泰, 宫政, 赵捷, 汤磊. 基于主题词的文本案例检索算法研究[J]. 计算机科学与应用, 2013, 3(8): 354-359. http://dx.doi.org/10.12677/CSA.2013.38062