Hotspot Topics Detection from WeChat Library Based on Model LDA
摘要: 为使图书馆工作人员免受大量冗余信息的困扰,实时了解广大师生的需求及关注热点,面向微信图书馆,本文给出一种基于LDA模型的微信热点话题检测方法。该方法首先通过构建图书馆领域专业词典合并特征词,其次应用LDA模型表示微信文本信息,最后采用主题相似度计算文本间的相似度,进而利用Single-Pass聚类算法识别热点话题。实验结果表明,该方法能够有效地对微信图书馆上的数据进行话题检测,在准确率、召回率和F1值上均有不错的效果。
Abstract: In order to make the library staff relieve from a large amount of redundant information and real-time understanding of the needs of teachers and students, for WeChat library, in the paper, the method of hotspot topic detection based on model Latent Dirichlet Allocation (LDA) was pro-posed. The method first merged the characteristic words by constructing the professional dictionary in the library field, and then all the texts of WeChat were described by model LDA. Finally, the similarity between texts was calculated by topic similarity, and then the Single-Pass clustering algorithm was used to cluster WeChat data and found hotspot topics. The experimental results show that this method can effectively identify hotspot topics, and achieve good results in precision, recall and F-measure.
文章引用:荀静. 基于LDA模型的微信图书馆热点话题检测[J]. 软件工程与应用, 2017, 6(5): 145-153. https://doi.org/10.12677/SEA.2017.65016


