基于梯度提升回归树模型的上海市二手房均价分析
Analysis of the Average Price of Second-Hand Houses in Shanghai Based on Gradient Boosting Regression Tree
DOI: 10.12677/ORF.2021.113030, PDF,    国家自然科学基金支持
作者: 汪春丽, 刘露萍*:贵州大学数学与统计学院,贵州 贵阳
关键词: 二手房均价机器学习梯度提升回归树模型对比Second-Hand House Average Price Machine Learning Gradient Boosting Regression Tree Model Contrast
摘要: 本文基于梯度提升回归树集成模型,利用采集的“链家”网站上海市近三年各住宅小区二手房的相关数据,分析影响上海市二手房均价的因素。对各影响因素运用Person相关系数矩阵及热力图进行初步分析,并将收集的数据分为训练集和测试集,训练并测试支持向量机模型、线性回归模型及集成模型。最终实验结果表明,基于梯度提升回归树的集成模型更能准确的预测上海市二手房的均价,且梯度提升回归树的MSE是其中最小,相关系数最大达到0.831,具有较好的拟合效果。
Abstract: Based on the gradient boosting regression tree, we analyze the factors affecting the average price of second-hand houses in Shanghai by using the data collected from “HOME LINK” website in recent three years. The Person correlation coefficient matrix and heat map are used for preliminary analysis of each influencing factor. Moreover the collected data are divided into training set and test set, and the support vector machine model, linear regression model and integration model are trained and tested respectively. The final experimental results show that the integrated model based on gradient boosting regression tree can more accurately predict the average price of second-hand houses in Shanghai. And the MSE of gradient boosting regression tree is the smallest, furthermore the correlation coefficient is up to 0.831, which has the best fitting effect.
文章引用:汪春丽, 刘露萍. 基于梯度提升回归树模型的上海市二手房均价分析[J]. 运筹与模糊学, 2021, 11(3): 257-267. https://doi.org/10.12677/ORF.2021.113030

参考文献

[1] 邬嘉怡, 王思玉, 史宏炜, 李虎森, 楼凯达, 崔丽鸿. 基于多小波的北京市房屋市场价格的分析预测[J]. 北京化工大学学报(自然科学版), 2019, 46(5): 101-106.
[2] 唐晓彬, 张瑞, 刘立新. 基于蝙蝠算法SVR模型的北京市二手房价预测研究[J]. 统计研究, 2018, 35(11): 71-81.
[3] 白丽娟, 闫相斌, 金家华. 基于搜索关键词关注度的商品房价格指数预测[J]. 预测, 2015, 34(4): 65-70.
[4] 陆丽丽, 胡斌, 李辉, 端木怡婷. 中国房价构成与预测的仿真分析[J]. 计算机仿真, 2014, 31(3): 230-238.
[5] 申瑞娜, 曹昶, 樊重俊. 基于主成分分析的支持向量机模型对上海房价的预测研究[J]. 数学的实践与认识, 2013, 43(23): 11-16.
[6] 谷秀娟, 李超. 基于马尔科夫链的房价预测研究[J]. 消费经济, 2012, 28(5): 40-42+48.
[7] 付莲莲, 伍健. 基于梯度提升回归模型的生猪价格预测[J]. 计算机仿真, 2020, 37(1): 347-350.
[8] 李一蜚, 秦凯, 李丁, 樊文智, 何秦. 基于梯度提升回归树算法的地面臭氧浓度估算[J]. 中国环境科学, 2020, 40(3): 997-1007.
[9] 杨文忠, 张志豪, 吾守尔•斯拉木, 温杰彬, 富雅玲, 王丽花, 王婷. 基于时间序列关系的GBRT交通事故预测模型[J]. 电子科技大学学报, 2020, 49(4): 615-621.
[10] 康传利, 顾峻峰, 刘兆威. 梯度提升回归树的旅游流量预测模型[J]. 数学的实践与认识, 2019, 49(15): 251-261.
[11] Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29, 1189-1232. [Google Scholar] [CrossRef
[12] Seon, J. and Yang, H.-S. (2019) A Study on Prediction of Housing Price Using Deep Learning. Residential Environment: Journal of the Residential Environment Institute of Korea, 17, 37-49. [Google Scholar] [CrossRef
[13] Daradi, S.A.M., Yusof, U.K. and Kader, N.I.B.A. (2018) Prediction of Housing Price Index in Malaysia Using Optimized Artificial Neural Network. Advanced Science Letters, 24, 1307-1311. [Google Scholar] [CrossRef
[14] 贾俊平, 何晓群, 金勇进. 统计学(第六版) [M]. 北京: 中国人民大学出版社, 2015.
[15] 王琴英. 北京房价与CPI的波动特性分析及趋势预测——基于协整关系的GARCH族模型分析[J]. 价格理论与实践, 2011(7): 57-58.
[16] Hashem, S.S., Barat, G. and Mohsen, N. (2021) Prediction of Higher Heating Value of Biomass materials Based on Proximate Analysis Using Gradient Boosted Regression Trees Method. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 43, 672-681. [Google Scholar] [CrossRef