# 基于随机森林模型的湫水河流域洪水过程模拟Random Forests Model Based Flood Process Simulation in the Qiushui River Basin

The accuracy level of flood forecasting in arid and semi-arid areas of the middle Yellow River region is generally not high, which is mainly due to the spatial and temporal variability of rainfall and the inten-sive disturbances of large-scale soil and water conservation measures on the runoff production and routing processes. With the development of modern statistical theory, intelligent machine learning algorithms provide a new way for flood forecasting in this region. Taking the Qiushui River Basin on the left bank of the middle reaches of the Yellow River as an example, the random forest algorithm was used to establish the storm-flood forecasting model and simulate the rainfall-runoff during the flood season. The results showed that when the calculation time step was 1 hour, the average value of the Nash-Sutcliffe efficiency (NSE) of the Random Forest model was 0.47, and the qualified rate was 42% when NSE ≥ 0.60 was measured. When the calculation time step was 0.5 hours, the average NSE value was 0.76, and the corresponding qualified rate increased to 88%. Therefore, the accuracy of the input data was a main factor affecting the model accuracy in this region. In addition, under different time steps conditions, the performance of the Random Forest model is obviously better than that of the traditional multivariate regression statistical model, indicating that the random forest model is suitable for flood process prediction in the Qiushui River basin, and has a certain reference value for the flood warning in the Loess Plateau in the middle reaches of the Yellow River.

1. 引言

2. 随机森林方法介绍

1) Bagging是根据统计中Bootstrap思想提出的一种集成学习算法，它从原始样本集中可重复抽样得到不同的Bootstrap训练样本，进而训练出各个基本分类器，以保证各个训练样本子集的差异性。对于决策树等不稳定(即对训练数据敏感)的分类器，Bagging算法能提高分类的准确度。此外，Bagging算法可以并行训练多个基本分类器，可以节省大量的时间开销，这也是该算法的优势之一。

2) 决策树(Decision Tree)是用于分类和预测的主要技术，它着眼于从一组无规则的事例推理出决策树的表示形式的分类规则。它利用树的结构将数据记录进行分类，树的一个叶节点(预报因子)就代表某个条件下的一个记录集，根据记录字段的不同取值建立树的分支；在每个分支子集中重复建立下层节点和分支，得到最终分类结果。基于决策树算法的一个最大的优点是，它在学习过程中不需要使用者了解很多背景知识，只要训练事例能够用属性即结论的方式表达出来，就能使用该算法进行学习。

3. 实例应用

3.1. 流域概况

3.2. 模型构建

Figure 1. Model structure of Random Forests

Figure 2. River network, rain gauges and hydrological station

${Q}_{t}=18.53672{P}_{t-}{}_{4}+2.358017{P}_{t-}{}_{6}+0.482646{Q}_{t}{}_{-1}+7.106008$ (1)

${Q}_{t}=13.39398{P}_{t-}{}_{4}+1.591412{P}_{t-}{}_{6}+0.799825{Q}_{t}{}_{-0.5}-0.47429$ (2)

3.3. 结果分析

3.3.1. 计算时间步长ΔT = 1 h

3.3.2. 计算时间步长ΔT = 0.5 h

3.3.3. 模型精度对比

Figure 3. Model performance of Random Forests when time step = 1 hour

Figure 4. Model performance of multiple regression model when time step = 1 hour

Figure 5. Model performance of Random Forests when time step = 0.5 hour

Figure 6. Model performance of multiple regression model when time step = 0.5 hour

Continued

4. 小结

