标题:
基于主题词的文本案例检索算法研究Algorithm Optimization about Textual Case Retrieval Based on Topic Words
作者:
孙镇, 袁辉, 孙泰, 宫政, 赵捷, 汤磊
关键字:
布尔检索, 主题词, 语义距离, 改进检索算法, 查准率, 查全率Boolean Retrieval; Topic Words; Semantic Distance; Improved Algorithm; Precision Rate; Recall Rate
期刊名称:
《Computer Science and Application》, Vol.3 No.8, 2013-11-28
摘要:
分析传统文本检索方法布尔检索的本质,发现该检索方法存在两个缺点:检索算法忽略了词语之间的语义关系以及不能对检索结果进行重要性排序,针对于此提出利用基于主题词的改进检索算法。通过丰富主题词构建关键词库,在语义信息检索框架的基础上,计算关键词的语义距离和相似度。最后将改进后的算法应用到灾情案例检索系统中,并对检索结果做性能分析,实验证明该算法在文本检索的查准率和查全率上都有较好的改善。Two shortages of Boolean retrieval, ignoring the semantic relations between words and unable to rank the retrieval results in order of importance, were found by analyzing the essence of traditional text retrieval, and in view of which, an improvement of algorithm optimization based on topic words was proposed. Through enriching topic words to structure keywords library, the semantic distance and similarity of keywords were calculated on the basis of semantic retrieval framework. The improved algorithm was applied in the military case retrieval system at last, and then retrieval results were analyzed to detect performance. It is observed that the improved algorithm has a better improvement in both precision rate and recall rate of retrieval.