基于BalanceCascade的软投票策略的商品销售预测方法
Commodity Sales Forecast Method Based on BalanceCascade Soft Voting Strategy
DOI: 10.12677/AAM.2022.113099, PDF,    国家自然科学基金支持
作者: 张 晨, 杨 进:上海理工大学,理学院,上海
关键词: 互信息类别不平衡软投票随机森林极限梯度提升Mutual Information Class-Imbalance Softvoting Random Forest Extreme Gradient Boosting
摘要: 随着互联网发展,网上购物已经成为人们生活中不可或缺的一部分,为了实现更好的帮助顾客推荐商品的目的。首先根据原有数据生成新的特征值,再用互信息的方法对数据进行特征选择;其次利用BalanceCascade算法处理类别不平衡的问题,借助集成策略弥补欠采样的缺陷,与简单采样方法相比,能够对样本数据得到充分的利用还降低了正负样本差造成的影响;最后选择用软投票的方法将XGBoost和随机森林结合为一个分类器做预测,降低了单一的算法所造成的偏差,从而得到更好的结果。基于阿里巴巴天池大赛所提供的数据,以查准率P、召回率R和F1值为评价指标,分别与当前热门的机器学习算法和融合模型进行对比,验证了该方法的有效性。
Abstract: With the development of the Internet, online shopping has become an indispensable part of people’s life, in order to achieve a better purpose to help customers recommend products. Firstly, new eigenvalues are generated according to the original data, and then the data are selected by the method of mutual information. Secondly, the BalanceCascade algorithm is used to deal with the class imbalance, and the integration strategy is used to make up for the defect of undersampling. Compared with the simple sampling method, it can make full use of the sample data and reduce the influence of positive and negative sample difference. Finally, the Softvoting method is used to combine XGBoost and random forest into a classifier to make prediction, which reduces the deviation caused by a single algorithm and gets better results. Based on the data provided by Alibaba Tianchi Competition, the accuracy rate P, recall rate R and F1 values are compared with the current popular machine learning algorithms to verify the effectiveness of this method.
文章引用:张晨, 杨进. 基于BalanceCascade的软投票策略的商品销售预测方法[J]. 应用数学进展, 2022, 11(3): 923-934. https://doi.org/10.12677/AAM.2022.113099

参考文献

[1] 李旭阳, 邵峰晶. LSTM与随机森林购买行为预测模型研究[J]. 青岛大学学报(工程技术版), 2018, 33(2): 17-20.
[2] 马倩. 基于机器学习的电子商务平台重复购买客户预测[D]: [硕士学位论文]. 兰州: 兰州大学, 2017.
[3] 陈龙. 基于机器学习方法的用户复购行为预测[D]: [硕士学位论文]. 天津: 南开大学, 2021.
[4] 张震. 基于机器学习算法的重复购买行为预测研究[D]: [硕士学位论文]. 重庆: 重庆工商大学, 2019.
[5] 邹润. 基于模型组合算法的用户个性化推荐研究[D]: [硕士学位论文]. 南京: 南京大学, 2014.
[6] Tian, Y., Ye, Z., Yan, Y. and Sun, M. (2015) A Practical Model to Predict the Repeat Purchasing Pattern of Consumers in the C2C E-Commerce. Electronic Commerce Research, 15, 571-583. [Google Scholar] [CrossRef
[7] Zuo, Y., Shawkat Ali, A.B.M. and Yada, K. (2014) Consumer Purchasing Behavior Extraction Using Statistical Learning Theory. Procedia Computer Science, 35, 1464-1473. [Google Scholar] [CrossRef
[8] Chang, H.J., Hung, L.P. and Ho, C.L. (2007) An Anticiaption Model Potential Customers’ Pruchasing Behavior Based on Clustering Analysis and Association Rules Analysis. Expert Systems with Applications, 32, 753-764. [Google Scholar] [CrossRef
[9] Cho, Y.S., Moon, S.C., Oh, I.B., Shin, J.-H. and Ryu, K.H. (2013) Incremenatal Weighted Mining Based on RFM Analysis for Rommending Prediction in U-Commerce. International Journal of Smart Home, 7, 133-144. [Google Scholar] [CrossRef
[10] Liu, X.-Y., Wu, J. and Zhou, Z. (2009) Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B, Cybernetics, 39, 539-550. [Google Scholar] [CrossRef
[11] Liu, T.Y. (2009) EasyEnsemble and Feature Selection for Imbalance Data Sets. International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, Shanghai, 3-5 August 2009, 517-520.
[12] 张尧. 基于互信息的特征选择方法研究[D]: [硕士学位论文]. 西安: 西安理工大学, 2019.
[13] Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[14] 方匡南, 吴见彬, 朱建平, 谢邦昌. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3): 32-38.
[15] Macdonald, C. and Ounis, I. (2006) Voting for Candidates: Adapting Data Fusion Techniques for an Expert Search Task. Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, 6-11 November 2006, 387-396. [Google Scholar] [CrossRef
[16] He, X., Pan, J., Ou, J., Xu, T., Liu, B., Xu, T., et al. (2014) Practical Lessons from Predicting Clicks on Ads at Facebook. Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, New York, 24-27 August 2014, 1-9. [Google Scholar] [CrossRef