基于Firth逻辑回归与随机森林的在线购买意向预测
Predicting Online Purchase Intention via Firth Logistic Regression and Random Forest
摘要: 在数字经济由规模扩张向质量驱动转型的背景下,预测用户购物意图对提升电商转化率至关重要。本研究利用UCI在线会话数据集作为基准测试样本,旨在验证Firth逻辑回归与随机森林模型在处理类别不平衡及数据稀疏性问题上的预测效能,为实时意图预测提供算法框架参考。研究发现:第一,实时行为是电商转化的核心指标,其中页面价值对购买概率有极强的正向驱动作用,尽管其存在统计内生性,但在动态识别逻辑下是捕捉访客从“浏览”转向“决策”阶段的关键先导指标,而退出率则产生显著的负向抑制效应;第二,外部情境具有显著调节作用,大促月份通过激活消费者的价格预期与限时心理,明显提升购买概率;第三,新老访客存在决策异质性,回访者对页面流畅度、周末窗口更为敏感,而新访客则更依赖高价值页面的即时引导。本文为电商企业动态识别高价值会话、进行智能化干预及实施差异化运营提供了指导性的算法支撑。
Abstract: As the digital economy transforms from scale expansion to quality-driven growth, it is significant to predict users’ shopping intention to enhance the conversion rates of e-commerce. This study uses the UCI online session dataset as a benchmark sample to verify the prediction efficiency of Firth Logistic Regression and Random Forest models in dealing with the problems of category imbalance and data sparsity, and provides an algorithm framework for real-time intention prediction. The findings reveal that: (1) Real-time behavior is the primary indicator of e-commerce transformation, in which page values have a strong positive driving effect on purchase probability. Despite its inherent statistical endogeneity, it serves as a critical precursor signal for capturing the transition from the “browsing” to the “decision-making” stage within dynamic recognition logic. Conversely, exit rates have a significant negative inhibitory effect; (2) External contexts play a significant moderating role. In big promotion months, the purchase probability is significantly improved by activating consumers’ price expectation and limited-time psychological triggers; (3) Significant decision heterogeneity exists between new and returning visitors. Old visitors are more sensitive to page fluency and weekend windows, while new visitors rely more on the real-time guidance of high-value pages. This study provides guiding algorithm decision support for e-commerce enterprises to dynamically identify high-value sessions, carry out intelligent intervention, and implement differentiated operational strategies.
文章引用:朱宇露. 基于Firth逻辑回归与随机森林的在线购买意向预测[J]. 电子商务评论, 2026, 15(4): 206-217. https://doi.org/10.12677/ecl.2026.154388

参考文献

[1] 曹志斌, 郑淼. 电子商务平台的用户行为分析与个性化推荐策略研究[J]. 商场现代化, 2025(23): 45-47.
[2] Novak, T.P., Hoffman, D.L. and Yung, Y. (2000) Measuring the Customer Experience in Online Environments: A Structural Modeling Approach. Marketing Science, 19, 22-42. [Google Scholar] [CrossRef
[3] 杨峰, 耿秀丽. 利用TDGCN-L优化电商推荐: 整合显式反馈以提高用户满意度[J/OL]. 重庆工商大学学报(自然科学版), 1-10.
https://link.cnki.net/urlid/50.1155.N.20251231.0933.002, 2026-02-11.
[4] 成保梅, 韩景灵. 融合情境因素的电子商务用户兴趣挖掘仿真[J]. 计算机仿真, 2020, 37(4): 326-329.
[5] 毛明扬, 马焕坚. 基于大模型的用户画像分析与数字人个性化电商营销策略[J]. 数字技术与应用, 2025, 43(12): 144-146.
[6] Cheng, P., Wang, W. and Yang, S. (2024) Doing the Right Thing: How to Persuade Travelers to Adopt Pro-Environmental Behaviors? An Elaboration Likelihood Model Perspective. Journal of Hospitality and Tourism Management, 59, 191-209. [Google Scholar] [CrossRef
[7] Mehrabian, A. and Russell, J.A. (1974) An Approach to Environmental Psychology. The MIT Press.
[8] 李新宇. 优衣库国际化品牌形象对中国消费者购买意愿的影响研究[D]: [硕士学位论文]. 南宁: 广西民族大学, 2023.
[9] 孟佳惠. 诚信归位电商生态方能行稳致远——2025年度电子商务领域“双十一”信用预警[J]. 中国信用, 2025(11): 20-29.
[10] 韩婷. Firth惩罚最大似然估计在logistic回归中解决分离问题时的应用[D]: [硕士学位论文]. 晋中: 山西医科大学, 2013.
[11] 易莹莹, 宋锡文. 我国流动人口健康影响因素重要性的研究——基于随机森林模型实证分析[J]. 西北人口, 2020, 41(4): 15-26.