#### 期刊菜单

Research on Data Filling Method of User Online Shopping Behavior Based on Random Forest
DOI: 10.12677/AIRR.2022.111003, PDF, HTML, XML, 下载: 111  浏览: 232

Abstract: Aiming at the prediction of user online shopping behavior, this paper studies the filling of user online shopping behavior data by using random forest method. Firstly, through data analysis, the missing distribution, missing quantity and the dependence of missing data in the data set are analyzed. Combined with the methods of paired deletion and object deletion, the simple missing data are processed, and then the data set is reconstructed to fill the missing data based on the random forest method. Finally, different algorithms are used to build user online shopping behavior prediction models, and the prediction effects of the data sets before and after filling are compared under these models, which proves the effectiveness and universality of the random forest method in filling the missing data of user online shopping behavior.

1. 引言

2. 用户网购行为数据分析

Figure 1. Diagram of missing distribution about data set

Figure 2. Diagram of missing feature quantity statistics

Figure 3. Existence correlation of missing features

3. 基于随机森林算法的用户网购行为数据填补

3.1. 随机森林算法

3.2. 基于随机森林方法填补缺失值

4. 基于机器学习的用户网购行为预测模型

5. 实验结果及分析

Figure 4. Comparison of prediction effects of various algorithms before and after filling

6. 结论

 [1] 王茜, 喻继军. 基于商品购买关系网络的多样性推荐[J]. 系统管理学报, 2020, 29(1): 61-72. [2] 祝歆, 刘潇蔓, 陈树广, 李静, 张天宇. 基于机器学习融合算法的网络购买行为预测研究[J]. 统计与信息论坛, 2017, 32(12): 94-100. [3] 胡晓丽, 张会兵, 董俊超, 吴冬强. 基于CNN-LSTM的用户购买行为预测模型[J]. 计算机应用与软件, 2020, 37(6): 59-64. [4] Patidar, P. and Tiwari, A. (2013) Handling Missing Value in Decision Tree Algorithm. International Journal of Computer Applications, 70, 31-36. https://doi.org/10.5120/12023-8063 [5] Bertsimas, D., Pawlowski, C. and Zhuo, Y.D. (2018) From Predictive Methods to Missing Data Imputation: An Optimization Approach. Journal of Machine Learning Research, 18, 1-39. [6] Maheswari, K., Packia Amutha Priya, P., Ramkumar, S. and Arun, M. (2020) Missing Data Handling by Mean Imputation Method and Statistical Analysis of Classification Algorithm. EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing, Coimbatore, 13-15 December 2018, 137-149. https://doi.org/10.1007/978-3-030-19562-5_14 [7] Wang, S., Li, M., Hu, N., Zhu, E., Hu, J., Liu, X., et al. (2019) K-means Clustering with Incomplete Data. IEEE Access, 7, 69162-69171. https://doi.org/10.1109/ACCESS.2019.2910287 [8] Kabir, G., Tesfamariam, S., Hemsing, J. and Sadiq, R. (2019) Handling Incomplete and Missing Data in Water Network Database Using Imputation Methods. Sustainable & Resilient Infrastructure, 5, 365-377. https://doi.org/10.1080/23789689.2019.1600960 [9] 丁明珠. 正态模型缺失数据的贝叶斯和Jackknife多重插补法的比较[J]. 计算技术与自动化, 2020, 39(2): 119-123. [10] 徐鸿艳, 孙云山, 秦琦琳, 朱明涛. 缺失数据插补方法性能比较分析[J]. 软件工程, 2021, 24(11): 11-14+10. [11] Gorshenin, A.K. and Lukina, S.S. (2021) On the Efficiency of Machine Learning Algorithms for Imputation in Spatiotemporal Meteorological Data. International Conference of Artificial Intelligence, Medical Engineering, Education, Moscow, 3-4 October 2020, 347-356. https://doi.org/10.1007/978-3-030-67133-4_32 [12] 郑智泉, 王孟孟, 田维琦. 基于加权K近邻算法的缺失数据填补研究[J]. 智能计算机与应用, 2021, 11(11): 31-33+42. [13] 张晓琴, 程誉莹. 基于随机森林模型的成分数据缺失值填补法[J]. 应用概率统计, 2017, 33(1): 102-110. [14] 游凤, 李代伟, 张海清, 汪杰, 彭莉, 王震. 基于归一化KNNI的随机森林填补算法[J]. 成都信息工程大学学报, 2021, 36(1): 32-40. [15] Martinez, W.G. (2021) Ensemble Pruning via Quadratic Margin Maximization. IEEE Access, 9, 48931-48951. https://doi.org/10.1109/ACCESS.2021.3062867 [16] Zhang, J., Dai, Q. and Yao, C. (2021) DEP-TSPmeta: A Multiple Criteria Dynamic Ensemble Pruning Technique Ad-Hoc for Time Series Prediction. International Journal of Machine Learning and Cybernetics, 12, 2213-2236. https://doi.org/10.1007/s13042-021-01302-y [17] 陈磊, 韩飞, 易文祥. 基于信息熵的多尺度FAST角点[J]. 计算机应用与软件, 2020, 37(10): 244-248+269. [18] 黄伟庆, 杨召阳, 魏冬, 张萌, 王文, 叶彬. 基于信息增益的无线通信信号指纹构建及识别机制研究[J]. 信息安全学报, 2020, 5(6): 11-26. [19] 董红瑶, 王弈丹, 李丽红. 随机森林优化算法综述[J]. 信息与电脑(理论版), 2021, 33(17): 34-37.