基于机器学习模型的水质预测评估
Evaluation of Water Quality Prediction Based on Machine Learning Models
DOI: 10.12677/sa.2026.152045, PDF,   
作者: 王 杰:浙江师范大学地理与环境科学学院,浙江 金华
关键词: 河流水质机器学习水质预测性能评估River Water Quality Machine Learning Water Quality Prediction Performance Evaluation
摘要: 随着城市化、工业化推进及人类活动强度增加,河流水环境问题日益突出,水质污染呈现多源性、时空异质性和复杂驱动性,水质预测成为水环境管理的重要需求。本研究为分析与预测主要地表水水质指标的时空分布特征,构建了随机森林(RF)、极端梯度提升(XGBoost)、决策树(DT)和自适应增强(AdaBoost)四种机器学习模型,并以决定系数(R2)、均方根误差(RMSE)和平均绝对误差(MAE)作为评估指标。结果显示,四种模型的预测性能排序为RF > XGBoost > DT > AdaBoost,其中RF模型表现最优,其R2、RMSE、MAE分别为0.985、0.042、0.024,具备极强的预测精度、稳定性及泛化能力,能有效捕捉水质指标间的复杂非线性关系。研究表明集成学习模型在处理非线性关系和抑制过拟合方面优势显著,RF模型可作为水质空间预测分析的最优模型,为精准开展水质动态模拟与水环境管理提供支撑。
Abstract: With the advancement of urbanization and industrialization and the intensification of human activities, river water environments have been increasingly threatened. Water quality pollution is characterized by multiple sources, pronounced spatiotemporal heterogeneity, and complex driving mechanisms, making water quality prediction a critical requirement for effective water environment management. To analyze and predict the spatiotemporal distribution patterns of key surface water quality indicators, this study developed four machine learning models, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), Decision Tree (DT), and Adaptive Boosting (AdaBoost). The coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) were employed to evaluate model performance. The results indicate that the predictive performance of the four models followed the order: RF > XGBoost > DT > AdaBoost. Among them, the RF model exhibited the best performance, with R2, RMSE, and MAE values of 0.985, 0.042, and 0.024, respectively, demonstrating superior prediction accuracy, stability, and generalization ability. The RF model effectively captured the complex nonlinear relationships among water quality indicators. The findings highlight the advantages of ensemble learning models in handling nonlinear processes and mitigating overfitting, suggesting that the RF model represents an optimal approach for spatial prediction and assessment of river water quality, thereby providing robust support for dynamic water quality simulation and water environment management.
文章引用:王杰. 基于机器学习模型的水质预测评估[J]. 统计学与应用, 2026, 15(2): 179-187. https://doi.org/10.12677/sa.2026.152045

参考文献

[1] 汪心雯, 刘子琦, 郭琼琼, 等. 贵州黄洲河流域水质时空分布特征及污染源解析[J]. 环境工程, 2021, 39(9): 69-75.
[2] 马克明, 孔红梅, 关文彬, 等. 生态系统健康评价: 方法与方向[J]. 生态学报, 2001, 21(12): 2106-2116.
[3] 杨丽蓉, 陈利顶, 孙然好. 河道生态系统特征及其自净化能力研究现状与发展[J]. 生态学报, 2009, 29(9): 5066-5075.
[4] 高雯媛, 邹霖, 朱俊毅, 等. 湖南省地表水水质时空变化特征及驱动因子分析[J]. 环境工程, 2024, 42(8): 17-24.
[5] 雷川华, 吴运卿. 我国水资源现状、问题与对策研究[J]. 节水灌溉, 2007(4): 41-43.
[6] Scanlon, B.R., Jolly, I., Sophocleous, M. and Zhang, L. (2007) Global Impacts of Conversions from Natural to Agricultural Ecosystems on Water Resources: Quantity versus Quality. Water Resources Research, 43, W03437. [Google Scholar] [CrossRef
[7] 张薇, 赵亚娟. 国际水资源现状与研究热点[J]. 地质通报, 2009, 28(2): 177-183.
[8] 李慧. 全球水资源未来可持续性研究[J]. 水利水电快报, 2023, 44(3): 5.
[9] 周佳君. 水资源现状及保护应对措施分析[J]. 城市建设理论研究(电子版), 2020(8): 44.
[10] Zhi, W., Appling, A.P., Golden, H.E., Podgorski, J. and Li, L. (2024) Deep Learning for Water Quality. Nature Water, 2, 228-241. [Google Scholar] [CrossRef] [PubMed]
[11] Liang, Y., Ke, S., Zhang, J., Yi, X. and Zheng, Y. (2018) GeoMAN: Multi-Level Attention Networks for Geo-Sensory Time Series Prediction. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, 13-19 July 2018, 3428-3434. [Google Scholar] [CrossRef
[12] Liu, Y., Zhang, Q., Song, L. and Chen, Y. (2019) Attention-Based Recurrent Neural Networks for Accurate Short-Term and Long-Term Dissolved Oxygen Prediction. Computers and Electronics in Agriculture, 165, Article ID: 104964. [Google Scholar] [CrossRef
[13] Zhi, W., Ouyang, W., Shen, C. and Li, L. (2023) Temperature Outweighs Light and Flow as the Predominant Driver of Dissolved Oxygen in US Rivers. Nature Water, 1, 249-260. [Google Scholar] [CrossRef
[14] Blaen, P.J., Khamis, K., Lloyd, C.E.M., Bradley, C., Hannah, D. and Krause, S. (2016) Real-Time Monitoring of Nutrients and Dissolved Organic Matter in Rivers: Capturing Event Dynamics, Technological Opportunities and Future Directions. Science of the Total Environment, 569, 647-660. [Google Scholar] [CrossRef] [PubMed]
[15] Ebeling, P., Kumar, R., Weber, M., Knoll, L., Fleckenstein, J.H. and Musolff, A. (2021) Archetypes and Controls of Riverine Nutrient Export across German Catchments. Water Resources Research, 57, e2020WR028134. [Google Scholar] [CrossRef
[16] Creed, I.F., Lane, C.R., Serran, J.N., Alexander, L.C., Basu, N.B., Calhoun, A.J.K., et al. (2017) Enhancing Protection for Vulnerable Waters. Nature Geoscience, 10, 809-815. [Google Scholar] [CrossRef] [PubMed]
[17] 姚亚. 数据预处理和直方图时间序列在水质预测中的应用[D]: [硕士学位论文]. 杭州: 浙江大学, 2013.
[18] Li, F., Li, D., Wei, Y., Ma, D. and Ding, Q. (2010) Dissolved Oxygen Prediction in Apostichopus japonicus Aquaculture Ponds by BP Neural Network and AR Model. Sensor Letters, 8, 95-101. [Google Scholar] [CrossRef
[19] Zhou, J., Wang, Y., Xiao, F., Wang, Y. and Sun, L. (2018) Water Quality Prediction Method Based on IGRA and LSTM. Water, 10, Article No. 1148. [Google Scholar] [CrossRef
[20] Zhou, S., Song, C., Zhang, J., Chang, W., Hou, W. and Yang, L. (2022) A Hybrid Prediction Framework for Water Quality with Integrated W-ARIMA-GRU and LightGBM Methods. Water, 14, Article No. 1322. [Google Scholar] [CrossRef
[21] Adnan, R.M., Liang, Z., Heddam, S., Zounemat-Kermani, M., Kisi, O. and Li, B. (2020) Least Square Support Vector Machine and Multivariate Adaptive Regression Splines for Streamflow Prediction in Mountainous Basin Using Hydro-Meteorological Data as Inputs. Journal of Hydrology, 586, Article No. 124371. [Google Scholar] [CrossRef
[22] Jordan, M.I. and Mitchell, T.M. (2015) Machine Learning: Trends, Perspectives, and Prospects. Science, 349, 255-260. [Google Scholar] [CrossRef] [PubMed]
[23] Seifeddine, M., Bradai, A., Bukhari, S.H.R., et al. (2020) A Survey on Machine Learning in Internet of Things: Algorithms, Strategies, and Applications. Internet of Things, 12, Article ID: 100314.
[24] Akhtar, N., Ishak, M.I.S., Ahmad, M.I., Umar, K., Md Yusuff, M.S., Anees, M.T., et al. (2021) Modification of the Water Quality Index (WQI) Process for Simple Calculation Using the Multi-Criteria Decision-Making (MCDM) Method: A Review. Water, 13, Article No. 905. [Google Scholar] [CrossRef
[25] Patel, D.D., Mehta, D.J., Azamathulla, H.M., Shaikh, M.M., Jha, S. and Rathnayake, U. (2023) Application of the Weighted Arithmetic Water Quality Index in Assessing Groundwater Quality: A Case Study of the South Gujarat Region. Water, 15, Article No. 3512. [Google Scholar] [CrossRef
[26] Uddin, M.G., Nash, S., Rahman, A. and Olbert, A.I. (2022) A Comprehensive Method for Improvement of Water Quality Index (WQI) Models for Coastal Water Quality Assessment. Water Research, 219, Article ID: 118532. [Google Scholar] [CrossRef] [PubMed]
[27] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. [Google Scholar] [CrossRef
[28] Jiang, F., Shi, X., Shi, F., Jia, Z., Song, X., Pu, T., et al. (2025) Scale-Dependent Drivers of Water Use Efficiency across China: Integrating Stable Isotopes, Remote Sensing, and Machine Learning. Catena, 260, Article ID: 109403. [Google Scholar] [CrossRef
[29] Chen, L., Zhou, J., Guo, L., Bian, X., Xu, Z., Chen, Q., et al. (2024) Global Distribution of Mercury in Foliage Predicted by Machine Learning. Environmental Science & Technology, 58, 15629-15637. [Google Scholar] [CrossRef] [PubMed]
[30] Hu, J. and Szymczak, S. (2023) A Review on Longitudinal Data Analysis with Random Forest. Briefings in Bioinformatics, 24, bbad002. [Google Scholar] [CrossRef] [PubMed]
[31] Min, C., Liao, G., Wen, G., et al. (2023) Ensemble Interpretation: A Unified Method for Interpretable Machine Learning.
[32] Hajihosseinlou, M., Maghsoudi, A. and Ghezelbash, R. (2023) A Novel Scheme for Mapping of MVT-Type Pb-Zn Prospectivity: LightGBM, a Highly Efficient Gradient Boosting Decision Tree Machine Learning Algorithm. Natural Resources Research, 32, 2417-2438. [Google Scholar] [CrossRef
[33] Zhou, Z.-H. (2025) Ensemble Methods: Foundations and Algorithms. CRC Press. [Google Scholar] [CrossRef
[34] Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., et al. (2020) From Local Explanations to Global Understanding with Explainable AI for Trees. Nature Machine Intelligence, 2, 56-67. [Google Scholar] [CrossRef] [PubMed]
[35] Natekin, A. and Knoll, A. (2013) Gradient Boosting Machines, a Tutorial. Frontiers in Neurorobotics, 7, Article No. 21. [Google Scholar] [CrossRef] [PubMed]
[36] Bian, W., Fang, J., Wang, P., Sun, Q., Fang, J., Kong, F., et al. (2025) Deep Learning Surrogate Models for Spatiotemporal Prediction of Coastal Flooding Inundations in Tianjin, China. Journal of Hydrology: Regional Studies, 60, Article ID: 102593. [Google Scholar] [CrossRef
[37] Xue, Y., Liang, H., Zhang, B. and He, C. (2022) Vegetation Restoration Dominated the Variation of Water Use Efficiency in China. Journal of Hydrology, 612, Article ID: 128257. [Google Scholar] [CrossRef
[38] Liang, Y., Ding, F., Liu, L., Yin, F., Hao, M., Kang, T., et al. (2025) Monitoring Water Quality Parameters in Urban Rivers Using Multi-Source Data and Machine Learning Approach. Journal of Hydrology, 648, Article ID: 132394. [Google Scholar] [CrossRef