基于大数据的网络平台商户在线评论评分与投票一致性异常检测
Anomaly Detection of the Consistency between Online Review Ratings and Votes for Merchants on Web Platforms Based on Big Data
摘要: 在在线平台中,用户评论与投票是评价商家服务质量和用户信誉的重要指标。然而,虚假评论与异常投票行为会严重干扰信息的真实性与公正性。本文聚焦于基于结构化数据的一致性检测问题,提出了一种面向大规模评论数据的评分一致性检测与评论投票一致性检测方法,并以Yelp官方开放数据集为例进行了实证研究。评分检测方面,通过对比商家页面展示的平均评分与评论数据实际计算的均分,识别出约0.97%的商家存在显著偏差(差值大于99%分位数),提示潜在的评分操纵或更新滞后。评论投票检测方面,通过比对用户档案中记录的“有用(useful)”票数与其所有评论被投票“有用”的总和,发现仅17.56%的用户数据完全一致,而低差异用户比例为7.54%,中等差异用户比例为50.18%,部分异常用户的差异甚至超过20万票,显示出严重的数据异常或可能的刷票行为。实验结果表明,基于评分与投票一致性的检测方法能够有效揭示平台数据中的异常模式,并为虚假评论与刷票行为的识别提供了一种低成本、高覆盖的前置筛查机制。本文的研究不仅验证了结构化一致性校验在大数据场景下的可行性与高效性,也为在线平台的评论生态治理与风险管控提供了实践参考。
Abstract: On online platforms, user reviews and votes are key indicators for assessing merchants’ service quality and users’ credibility. However, fake reviews and anomalous voting behaviors can seriously undermine the authenticity and fairness of information. This paper focuses on consistency auditing based on structured data, proposing rating-consistency and review-vote consistency detection methods for large-scale review data, and conducts an empirical study using Yelp’s official open dataset as a case. For rating detection, by comparing the average rating displayed on merchant pages with the mean recalculated from the underlying review records, we identify about 0.97% of merchants with significant discrepancies (differences above the 99th percentile), suggesting potential rating manipulation or update lag. For review-vote detection, by comparing the “useful” votes recorded on user profiles with the sum of “useful” votes received by all of their reviews, we find that only 17.56% of users are perfectly consistent, while 7.54% exhibit low discrepancies and 50.18% exhibit medium discrepancies; for some anomalous accounts, the discrepancy exceeds 200,000 votes, indicating severe data anomalies or possible ballot-stuffing behavior. The experimental results demonstrate that consistency checks on ratings and votes can effectively reveal abnormal patterns in platform data and provide a low-cost, high-coverage prescreening mechanism for identifying fake reviews and vote-manipulation behavior. This study not only verifies the feasibility and efficiency of structured consistency auditing in big-data settings, but also offers practical guidance for review-ecosystem governance and risk control on online platforms.
文章引用:张腾庆. 基于大数据的网络平台商户在线评论评分与投票一致性异常检测[J]. 电子商务评论, 2025, 14(11): 1756-1762. https://doi.org/10.12677/ecl.2025.14113617

参考文献

[1] 孙晓燕. 基于迁移与半监督共生融合的虚假评论识别[J]. 南京大学学报, 2022, 58(4): 115-123.
[2] Gupta, R., Jindal, V. and Kashyap, I. (2024) Recent State-of-the-Art of Fake Review Detection: A Comprehensive Review. The Knowledge Engineering Review, 39, e8. [Google Scholar] [CrossRef
[3] 任亚峰, 尹兰, 姬东鸿. 基于语言结构和情感极性的虚假评论识别[J]. 计算机科学与探索, 2014(3): 313-320.
[4] 王乐, 张紫琼, 崔雪莹. 虚假评论的识别与过滤: 现状与展望[J]. 电子科技大学学报, 2022, 24(1): 31-41+64.
[5] 李璐旸, 秦兵, 刘挺. 虚假评论检测研究综述[J]. 计算机学报, 2018, 41(4): 946-968.
[6] Yao, J. (2024) Fake Review Detection with Label-Consistent and Hierarchical-Relation-Aware Graph Contrastive Learning. Expert Systems with Applications, 225, Article 120647.
[7] He, S., Hollenbeck, B., Overgoor, G., Proserpio, D. and Tosyali, A. (2024) Detecting Fake Review Buyers Using Network Structure: Direct Evidence from Amazon.
[8] Zhao, C. and Wang, C.A. (2023) A Cross-Site Comparison of Online Review Manipulation Using Benford’s Law. Electronic Commerce Research, 23, 365-406. [Google Scholar] [CrossRef
[9] Yang, Z., Sun, Q., Zhang, Y. and Zhang, B. (2018) Uncovering Anomalous Rating Behaviors for Rating Systems. Neurocomputing, 308, 205-226. [Google Scholar] [CrossRef
[10] Luca, M. and Zervas, G. (2016) Fake It till You Make It: Reputation, Competition, and Yelp Review Fraud. Management Science, 62, 3412-3427. [Google Scholar] [CrossRef
[11] 杨丰瑞, 吴晓浩, 万程峰. 融合情感极性与信任函数的虚假评论检测方法[J]. 计算机工程与科学, 2019, 41(9): 1553-1560.
[12] 王乐, 叶强, 李一军, 张紫琼. 评论操控: 概念解析、理论发展与未来展望[J]. 天津传媒大学学报, 2024(1): 45-60.
[13] Lin, Y., Wang, X.L., Zhu, T., et al. (2015) Survey on Quality Evaluation and Control of Online Reviews.