基于LDA主题模型的商品在线评论文本挖掘分析
Text Mining Analysis of Online Reviews of Commodities Based on LDA Topic Model
摘要: 互联网的快速发展给各大电商平台和生产厂家带来机遇的同时也带来了挑战。用户在互联网上购物的同时,产生了海量的评论数据,而在这些评论文本中包含着许多有价值的潜在信息,因此通过对商品评论信息的分析,不仅能让企业掌握更多自身产品和服务中的具体细节信息,同时能够进一步分析用户的消费行为,从本质上发现用户的需求偏好,推进企业实施科学经营决策。本文的研究对象是笔记本电脑,使用爬虫技术获取联想拯救者Y9000P的用户评论,对数据进行预处理、分词与词性标注,采用余弦相似度的方法进行主题数寻优,确定主题数后建立隐藏式狄利克雷模型(Latent Dirichlet Allocation),挖掘用户高频关注的产品属性,用词典匹配的方法匹配情感词,进行情感倾向分析,得到用户对产品的意见、态度、购买偏好、购买习惯以及购买动机。
Abstract: The rapid development of the Internet has brought opportunities as well as challenges to major e-commerce platforms and manufacturers. While users are shopping on the Internet, they generate a large amount of comment data, and these comment texts contain many valuable potential information. Therefore, through the analysis of commodity comment information, enterprises can not only grasp more specific details of their own products and services, but also further analyze users’ consumption behavior, discover users’ demand preferences in essence, and promote enterprises to implement scientific management decisions. The research object of this paper is notebook computer. It uses crawler technology to obtain user comments of Lenovo savior Y9000P, preprocesses the data, segment and label the part of speech, uses cosine similarity method to optimize the number of topics, establishes LDA model after determining the number of topics, excavates the product attributes that users pay high attention to, uses dictionary matching method to match emotional words, carries out emotional tendency analysis, and obtains users’ opinions, attitudes, purchase preferences, habits and motivations.
文章引用:窦欣怡. 基于LDA主题模型的商品在线评论文本挖掘分析[J]. 电子商务评论, 2024, 13(3): 8710-8718. https://doi.org/10.12677/ecl.2024.1331066

参考文献

[1] Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[2] 王庆福, 王兴国. 基于LDA的网络评论主题发现研究[J]. 无线互联科技, 2016(11): 103-104.
[3] 关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究[J]. 现代图书情报技术, 2016(9): 42-50.
[4] 孙红, 俞卫国. 改进LDA模型的短文本聚类方法[J]. 软件导刊, 2021, 20(9): 1-6.
[5] 张尧政, 邓少灵. 基于文本情感分析的企业网络舆情应对策略比较研究[J]. 电子商务, 2019(5): 32-35.
[6] 彭丽徽, 李贺, 张艳丰, 陈远方. 基于品牌声誉感知差异的在线评论有用性影响因素实证研究[J]. 情报科学, 2017, 35(9): 159-164.
[7] 罗汉洋, 李智妮, 林旭东, 于素敏. 网络口碑影响机制: 信任的中介和性别及涉入度的调节[J]. 系统管理学报, 2019, 28(3): 401-418.
[8] Nogueira, E. and Tsunoda, D.F. (2018) A Proposed Model for Consumer-Based Brand Equity Analysis on Social Media Using Data Mining and Social Network Analysis. Journal of Relationship Marketing, 17, 95-117. [Google Scholar] [CrossRef
[9] Yang, C., Wu, L., Tan, K., Yu, C., Zhou, Y., Tao, Y., et al. (2021) Online User Review Analysis for Product Evaluation and Improvement. Journal of Theoretical and Applied Electronic Commerce Research, 16, 1598-1611. [Google Scholar] [CrossRef
[10] Hu, S., Kumar, A., Al-Turjman, F., Gupta, S., Seth, S. and Shubham (2020) Reviewer Credibility and Sentiment Analysis Based User Profile Modelling for Online Product Recommendation. IEEE Access, 8, 26172-26189. [Google Scholar] [CrossRef
[11] 秦春秀, 祝婷, 赵捧未, 张毅. 自然语言语义分析研究进展[J]. 图书情报工作, 2014, 58(22): 130-137.
[12] 张良均. R语言数据分析与挖掘实战[M]. 北京: 机械工业出版社, 2015.
[13] 李春晓, 李辉, 刘艳筝, 等. 多彩华夏: 大数据视角的入境游客体验感知差异深描[J]. 南开管理评论, 2020, 23(1): 28-39.
[14] 刘兵, 郑承利. 基于EMD特征提取的高频面板数据自适应聚类方法[J]. 统计与决策, 2022, 38(10): 16-20.