电商背景下的在线购买者购买意向预测分析
Predictive Analysis of Online Shoppers’ Purchase Intention in the E-Commerce Context
摘要: 在人工智能与数据科学快速发展的背景下,众多机器学习方法相继涌现,其中支持向量机凭借其坚实的理论基础成为广泛应用的分类模型。然而,在实际应用中,当数据集存在噪声或异常值时,传统支持向量机的分类性能下降。本研究基于UCI机器学习库的在线购物者购买意向数据集(Online Shoppers Purchasing Intention Dataset),该数据集涵盖12,330个不同用户的会话记录,时间跨度为一年,包含17个多维度特征,涉及页面访问次数与停留时长、跳出率、退出率、页面价值、节日接近度,以及访问来源、访客类型、访问月份等类别变量。目标变量为是否产生购买意向(二分类)。本文创新性地提出了一种基于有界框架的有界弹球孪生支持向量机的鲁棒分类器模型。本研究以电商平台的在线购买者行为数据为基础,针对电商背景下的购买意向预测问题,提出并应用了基于有界分位损失的孪生支持向量机(BP-TSVM)模型。数据集涵盖用户在电商网站的多维访问特征及历史行为记录,目标变量为是否产生购买意向(二分类)。为验证模型的有效性,本文将BP-TSVM与经典孪生支持向量机(TSVM)、弹球孪生支持向量机(PinTSVM)以及最小二乘孪生支持向量机(TPMSVM)进行对比实验。实验在相同电商数据集和参数优化策略下进行,并采用准确率、sd值等指标评估模型性能。结果表明,BP-TSVM在预测精度和鲁棒性方面均优于对比模型,尤其在处理存在噪声的电商购买意向数据时表现更为稳定。该研究为电商企业在精准营销、用户分层及个性化推荐等方面提供了可行的技术路径和实证依据。
Abstract: With the rapid advancement of artificial intelligence and data science, numerous machine learning techniques have emerged, among which Support Vector Machines (SVMs) have gained wide adoption for classification tasks due to their solid theoretical foundation. However, their performance often deteriorates in the presence of noise or outliers. This study utilizes the Online Shoppers Purchasing Intention Dataset from the UCI Machine Learning Repository, comprising 12,330 unique user sessions collected over one year, with 17 behavioral and categorical features such as page visit counts and durations, bounce and exit rates, page values, proximity to special days, traffic sources, visitor types, and months of visit. The target is a binary variable indicating purchase intention. We propose a novel robust classifier, the Bounded Pinball Twin Support Vector Machine (BP-TSVM), designed for purchase intention prediction in e-commerce environments. The dataset includes multi-dimensional access characteristics and historical behavior records of users on e-commerce websites, with the target variable being whether there is a purchase intention (binary classification). The proposed method is benchmarked against the classical Twin Support Vector Machine (TSVM), Pinball Twin Support Vector Machine (PinTSVM), and Twin Parametric Margin Support Vector Machine (TPMSVM) under identical parameter optimization settings. Experimental evaluation using accuracy and standard deviation metrics shows that BP-TSVM consistently outperforms the baselines in both predictive accuracy and robustness, particularly on noisy e-commerce datasets. These findings offer a viable technical solution and empirical insights for enhancing precision marketing, customer segmentation, and personalized recommendation in e-commerce platforms.
文章引用:张朋. 电商背景下的在线购买者购买意向预测分析[J]. 电子商务评论, 2025, 14(9): 1786-1796. https://doi.org/10.12677/ecl.2025.1493104

参考文献

[1] Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning, 20, 273-297. [Google Scholar] [CrossRef
[2] Joachims, T. (1998) Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C. and Rouveirol, C., Eds., Machine Learning: ECML-98, Springer Berlin Heidelberg, 137-142. [Google Scholar] [CrossRef
[3] Moghaddam, B. and Yang, M.H. (1998) Face Recognition Using Support Vector Machines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1066-1073.
[4] Liu, C. and Wechsler, H. (2003) Face Recognition Using Support Vector Machines. IEEE Transactions on Neural Networks, 14, 195-200.
[5] Burges, C.J.C. (1998) A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2, 121-167. [Google Scholar] [CrossRef
[6] Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992) A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, 27-29 July 1992, 144-152. [Google Scholar] [CrossRef
[7] Osuna, E., Freund, R. and Girosi, F. (1997) Training Support Vector Machines: An Application to Face Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, 17-19 June 1997, 130-136.
[8] Platt, J.C. (1998) Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research.
[9] Shen, X., Zhang, Y. and Chen, X. (2017) Support Vector Machine with Noisy Features: Robustness Analysis and Feature Selection. IEEE Transactions on Neural Networks and Learning Systems, 28, 2335-2346.
[10] Huang, J., Zhang, L. and Metaxas, D. (2013) Support Vector Machines with Pinball Loss Function. Pattern Recognition, 46, 1552-1563.
[11] Wu, T. and Liu, C. (2007) Robust Truncated Hinge Loss Support Vector Machines. Journal of Machine Learning Research, 8, 2301-2327.
[12] Tang, Z., Chen, Y. and Li, J. (2021) Valley Loss for Robust Multi-Class Classification. Neurocomputing, 453, 260-272.
[13] Jayadeva, Khemchandani, R. and Chandra, S. (2007) Twin Support Vector Machines for Pattern Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 905-910. [Google Scholar] [CrossRef] [PubMed]
[14] 谢平, 邹传伟, 刘海二. 互联网金融模式研究[J]. 金融研究, 2012(12): 11-22.
[15] 孟陆, 刘凤军, 陈斯允, 段坤. 我可以唤起你吗——不同类型直播网红信息源特性对消费者购买意愿的影响研究[J]. 管理世界, 2020, 36(4): 142-156.
[16] 江积海, 李琴. 平台型企业商业模式创新中“属性-功能”认知演化机理: 基于海尔和苹果的案例研究[J]. 管理评论, 2016, 28(7): 249-257.