基于数据预处理和支持向量回归的PM2.5、NO2和SO2组合预测研究
Research on Combined Forecasting of PM2.5, NO2and SO2 Based on Data Preprocessing Technology and Support Vector Regression
摘要: 由于空气污染能够引起多种呼吸道非传染性疾病,甚至造成生命质量受损或者过早死亡,所以空气污染已经成为全球死亡人数增加的第二大原因,因此,制定合理且有效的空气污染治理措施已经迫在眉睫。预测在空气污染预警过程中扮演着重要的角色,准确且科学的预测能够帮助人们有效规避空气污染的危害,因此,提高预测的精度与科学性也成为众多学者关心的问题之一。本研究为了提高空气污染预测的准确性,采用了数据预处理互补集合经验模式分解(complementary ensemble empirical mode decomposition, CEEMD)技术、支持向量回归(support vector regression, SVR)、广义回归神经网络(general regression neural network, GRNN)和粒子群优化(particle swarm optimization, PSO)算法建立组合模型。通过PM2.5、NO2和SO2时间序列数据检验建立的组合预测模型的有效性,根据平均绝对百分比误差(mean absolute percentage error, MAPE)发现:组合预测模型能够提高PM2.5、NO2和SO2指标的预测精度,如西安市的SO2指标,最优单项模型的MAPE值为6.13%,而组合模型的MAPE值为5.86%。总之,组合预测模型能够为空气污染治理提供更准确的预测信息,为空气污染的防控提供理论支持。
Abstract: Because air pollution can cause a variety of respiratory non-communicable diseases, and even lead to impaired quality of life or premature death, the air pollution has become the second leading cause of the increase in deaths worldwide. Therefore, it is extremely urgent to formulate reasonable and effective air pollution control measures. Prediction plays an important role in the early warning process of air pollution. Accurate and scientific predictions can help people avoid the hazards of air pollution effectively. Therefore, how to improve the accuracy and scientificity of prediction has become one of the concerns of many scholars. To improve the accuracy of air pollution prediction, this study adopts data preprocessing technology complementary ensemble empirical mode decomposition (CEEMD), support vector regression (SVR), general regression neural network (GRNN), and particle swarm optimization (PSO) algorithms to establish a combined model. By PM2.5, NO2 and SO2 index time-series data set checking the validity of combined forecasting model, according to the mean absolute percentage error (MAPE), it is found that the combined model can improve the prediction accuracy of PM2.5, NO2 and SO2 index. For example, the MAPE of the optimal individual model and the combined model of the SO2 index is 6.13% and 5.86% respectively in Xi’an. In short, the combined prediction model can provide more accurate prediction information for air pollution control and provide theoretical support for air pollution prevention and control.
文章引用:方敏, 魏麟, 马晶, 李元林, 袁艳, 崔旭东, 施岱瑜, 朱素玲. 基于数据预处理和支持向量回归的PM2.5、NO2和SO2组合预测研究[J]. 统计学与应用, 2020, 9(5): 792-800. https://doi.org/10.12677/SA.2020.95082

参考文献

[1] 世界卫生组织. 空气污染[S]. https://www.who.int/topics/air_pollution/zh/, 2016.
[2] Bao, J., Yang, X., Zhao, Z., et al. (2015) The Spatial-Temporal Characteristics of Air Pollution in China from 2001-2014. International Journal of Environmental Research and Public Health, 12, 15875-15887.
[Google Scholar] [CrossRef] [PubMed]
[3] Wang, Y., Ying, Q., Hu, J., et al. (2014) Spatial and Temporal Variations of Six Criteria Air Pollutants in 31 Provincial Capital Cities in China during 2013-2014. Environment International, 73, 413-422.
[Google Scholar] [CrossRef] [PubMed]
[4] Zhou, M., Wang, H., Zeng, X., et al. (2019) Mortality, Morbidity, and Risk Factors in China and Its Provinces, 1990-2017: A Systematic Analysis for the Global Burden of Disease Study 2017. Lancet, 394, 1145-1158.
[Google Scholar] [CrossRef
[5] 马洪群, 崔莲花. 大气污染物(SO2、NO2)对中国居民健康效应影响的meta分析[J]. 职业与健康, 2016, 32(8): 1038-1044.
[6] Khaniabadi, Y., Usef, O., Daryanoosh, S.M., Hopke, P.K., et al. (2017) Acute Myocardial Infarction and COPD Attributed to Ambient SO2 in Iran. Environmental Research, 156, 683-687.
[Google Scholar] [CrossRef] [PubMed]
[7] 中华人民共和国环境保护部. 环境空气质量标准[S]. http://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/dqhjbh/dqhjzlbz/201203/W020120410330232398521.pdf, 2012.
[8] Wang, J., Du, P., Hao, Y., et al. (2020) An Innovative Hybrid Model Based on Outlier Detection and Correction Algorithm and Heuristic Intelligent Optimization Algorithm for Daily Air Quality Index Forecasting. Journal of Environmental Management, 255, 109855.
[Google Scholar] [CrossRef] [PubMed]
[9] Chen, J., Zhou, D., Lyu, C., et al. (2018) An Integrated Method Based on Ceemd-Sampen and the Correlation Analysis Algorithm for the Fault Diagnosis of a Gearbox under Different Working Conditions. Mechanical Systems and Signal Processing, 113, 102-111.
[Google Scholar] [CrossRef
[10] Vapnik, V. (1999) An Overview of Statistical Learning Theory. IEEE Transactions on Neural Networks, 10, 988-999.
[Google Scholar] [CrossRef] [PubMed]
[11] 闫国华, 朱永生. 支持向量机回归的参数选择方法[J]. 计算机工程, 2009, 35(14): 218-220.
[12] Mirjalili, S. and Hashim, S. (2010) A New Hybrid PSOGSA Algorithm for Function Optimization. The 2010 International Conference on Computer and Information Application, Tianjin, 3-5 December 2010, 374-377.
[Google Scholar] [CrossRef
[13] Sarafrazi, S. and Nezamabadi-Pour, H. (2013) Facing the Classification of Binary Problems with a GSA-SVM Hybrid System. Mathematical and Computer Modelling, 57, 270-278.
[Google Scholar] [CrossRef
[14] Specht, D. (1991) A General Regression Neural Network. IEEE Transactions on Neural Networks, 2, 568-576.
[Google Scholar] [CrossRef] [PubMed]