基于LDA模型的微信图书馆热点话题检测
Hotspot Topics Detection from WeChat Library Based on Model LDA
摘要: 为使图书馆工作人员免受大量冗余信息的困扰,实时了解广大师生的需求及关注热点,面向微信图书馆,本文给出一种基于LDA模型的微信热点话题检测方法。该方法首先通过构建图书馆领域专业词典合并特征词,其次应用LDA模型表示微信文本信息,最后采用主题相似度计算文本间的相似度,进而利用Single-Pass聚类算法识别热点话题。实验结果表明,该方法能够有效地对微信图书馆上的数据进行话题检测,在准确率、召回率和F1值上均有不错的效果。
Abstract: In order to make the library staff relieve from a large amount of redundant information and real-time understanding of the needs of teachers and students, for WeChat library, in the paper, the method of hotspot topic detection based on model Latent Dirichlet Allocation (LDA) was pro-posed. The method first merged the characteristic words by constructing the professional dictionary in the library field, and then all the texts of WeChat were described by model LDA. Finally, the similarity between texts was calculated by topic similarity, and then the Single-Pass clustering algorithm was used to cluster WeChat data and found hotspot topics. The experimental results show that this method can effectively identify hotspot topics, and achieve good results in precision, recall and F-measure.
文章引用:荀静. 基于LDA模型的微信图书馆热点话题检测[J]. 软件工程与应用, 2017, 6(5): 145-153. https://doi.org/10.12677/SEA.2017.65016

参考文献

[1] Quan, X., Liu, G., Lu, Z., Ni, X. and Liu, W. (2010) Short Text Similarity Based on Probabilistic Topics. Knowledge and Information Systems, 25, 473-491.
https://doi.org/10.1007/s10115-009-0250-y
[2] 张志飞, 苗夺谦, 高灿. 基于LDA主题模型的短文本分类方法[J]. 计算机应用, 2013, 33(6): 1587-1590.
[3] 孙励. 基于微博的热点话题发现[D]: [硕士学位论文]. 北京: 北京邮电大学, 2012.
[4] 刘红兵, 李文坤, 张仰森. 基于LDA模型和多层聚类的微博话题检测[J]. 计算机技术与发展, 2016, 26(6): 25-30.
[5] 汪进祥. 基于主题模型的微博话题挖掘[D]: [硕士学位论文]. 北京: 北京邮电大学, 2015.
[6] 余传明, 张小青, 陈雷. 基于LDA模型的评论热点挖掘: 原理与实现[J]. 情报理论与实践, 2010, 33(5): 103-106.
[7] Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[8] Griffiths, T. (2002) Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation.
[9] 孙昌年, 郑诚, 夏青松. 基于LDA的中文文本相似度计算[J]. 计算机技术与发展, 2013(1): 217-220.
[10] 方正. 微博短文本分析技术研究及应用[D]: [硕士学位论文]. 成都: 电子科技大学, 2014.
[11] 王鹏, 高铖, 陈晓美. 基于LDA模型的文本聚类研究[J]. 情报科学, 2015(1): 63-68.
[12] 朱恒民, 朱卫未. 基于Single-Pass的网络话题在线聚类方法研究[J]. 现代图书情报技术, 2011(12): 52-57.
[13] Steyvers, M. and Griffiths, T. (2007) Probabilistic Topic Models. Handbook of Latent Semantic Analysis, 427, 424-440.