基于组合特征LDA的文档自动摘要算法
Automatic Summarization Algorithm Based on the Combined Features of LDA
DOI: 10.12677/CSA.2013.32025, PDF, HTML,  被引量 下载: 3,430  浏览: 8,099  国家科技经费支持
作者: 吴登能*, 袁贞明, 李星星:杭州师范大学
关键词: 自动摘要主题模型LDAAutomatic Summarization; Topic Model; LDA
摘要: 文档自动摘要可以帮助人们在海量信息中快速高效地获取主要信息。本文以句子作为处理单元提出一个基于LDA模型的句子主题特征,通过计算文档主题分布与句子主题分布之间的相似性,结合句子在文档中的位置和标题相似性等基础特征,形成组合特征计算句子权重,最后根据权重排序抽取摘要。实验结果显示,在LDA模型中加入组合特征后,自动摘要的性能得到了提高。
Abstract: Automatic summarization can help people to get the main information from the massive amounts of information more quickly and efficiently. In this paper, a document summarization algorithm based on LDA is proposed. Firstly, we calculate the similarity of topics probability distribution between document and sentence as a new feature. Then, we also considered traditional summarization features such as position of sentence in a text and topic similarity. Finally, summary are generated by selecting the sentences with highest scores. Experimental results show that the performance of our method outperforms the traditional methods when the combined features join into the LDA Model.
文章引用:吴登能, 袁贞明, 李星星. 基于组合特征LDA的文档自动摘要算法[J]. 计算机科学与应用, 2013, 3(2): 145-148. http://dx.doi.org/10.12677/CSA.2013.32025

参考文献