基于随机森林与最大相关最小冗余的泥石流信号识别模型
Debris Flow Signal Identification Model Based on Random Forest and Maximum Relevance Minimum Redundancy
摘要: 本文基于最大相关最小冗余(mRMR)准则与随机森林算法,提出一种具有数学可解释性的泥石流信号识别模型。以瑞士Illgraben地区地震信号数据为基础,从时域、频域及时频域提取多维特征,并基于互信息理论构建mRMR特征筛选模型,通过优化对应目标函数遴选出5个最具判别力且冗余度最低的特征。利用该特征子集训练随机森林模型,依据基尼不纯度最小化原则生成决策树,并通过集成投票机制实现分类。结果表明,所建模型在准确率与AUC (Area Under the Curve)方面均优于传统机器学习方法,仅用5个特征即达到与全特征随机森林相当的识别性能,且在6场独立事件测试中全部正确分类。本研究不仅提供了有效的泥石流识别工具,也从特征约简与集成学习耦合的角度为地质灾害信号处理建立了数学模型支撑。
Abstract: This paper proposes a mathematically interpretable debris flow signal identification model based on the Maximum Relevance and Minimum Redundancy (mRMR) criterion and the Random Forest algorithm. Using seismic signal data from the Illgraben area in Switzerland, multi-dimensional features are extracted from the time, frequency, and time-frequency domains. A feature selection model based on mutual information theory is constructed using mRMR, and by optimizing the corresponding objective function, the five most discriminative and least redundant features are selected. This feature subset is used to train a Random Forest model, where decision trees are generated based on the principle of Gini impurity minimization, and classification is achieved through an ensemble voting mechanism. The results show that the proposed model outperforms traditional machine learning methods in both accuracy and AUC, achieving recognition performance comparable to that of a full-feature Random Forest model using only five features, and correctly classifying all 6 independent event tests. This study not only provides an effective tool for debris flow identification but also establishes a mathematical model support for geological hazard signal processing from the perspective of coupling feature reduction and ensemble learning.
文章引用:李晓鹏. 基于随机森林与最大相关最小冗余的泥石流信号识别模型[J]. 理论数学, 2025, 15(10): 102-110. https://doi.org/10.12677/pm.2025.1510253

参考文献

[1] 谭万沛. 泥石流及其灾害的极大值[J]. 灾害学, 1987, 2(3): 79-83.
[2] 李树德, 任秀生, 岳升阳, 等. 地震与泥石流活动[J]. 水土保持研究, 2001, 8(2): 26-27.
[3] 吴积善, 田连权. 泥石流及其综合治理[M]. 北京: 科学出版社, 1993: 51-59.
[4] 陈景武, 陈精日. 泥石流监测预警站的组建[J]. 山地研究, 1992, 10(1): 67-72.
[5] 崔培琪. 基于随机森林与ARIMA模型的降水变化与灾害风险评估[J]. 理论数学, 2025, 15(1): 237-258.
[6] Hibert, C., Provost, F., Malet, J., Maggi, A., Stumpf, A. and Ferrazzini, V. (2017) Automatic Identification of Rockfalls and Volcano-Tectonic Earthquakes at the Piton De La Fournaise Volcano Using a Random Forest Algorithm. Journal of Volcanology and Geothermal Research, 340, 130-142. [Google Scholar] [CrossRef
[7] 董娅婷. 随机森林与传统经典方法在回归与分类问题中的比较[J]. 统计学与应用, 2023, 12(2): 255-260.
[8] 刘亚文, 温勇. 基于最大化联合互信息和最小化联合熵的特征选择[J]. 应用数学进展, 2023, 12(4): 1451-1460.
[9] 姚世祎, 杨盛腾, 李裕梅. 基于混淆矩阵的机器学习分类评价指标研究及Python实践[J]. 数据挖掘, 2022, 12(4): 351-367.