基于集成欠采样的企业财务困境预警方法研究
Research on an Enterprise Financial Distress Prediction Method Based on Ensemble Undersampling
摘要: 企业财务困境预警(FDP)是防范系统性金融风险、提升资本配置效率的重要工具。然而,现实企业数据中困境样本(ST企业)比例极低,导致模型易受多数类样本支配,出现识别失衡与边界偏置。为应对这一挑战,本文系统探讨了基于集成欠采样的财务困境预警模型,旨在通过样本再平衡策略提升模型对少数类企业的识别能力与跨时间窗口的稳健性。本文基于2010~2022年中国A股上市公司财务数据,构建涵盖偿债能力、盈利能力、成长性与营运效率等维度的综合指标体系,设置四个预测窗口(T-1至T-4)以考察时间跨度对模型性能的影响。以决策树为基学习器,本文对比分析了传统欠采样方法(ENN, TomekLink, NearMiss)与集成欠采样方法(RUSBoost, EasyEnsemble, HUE)的分类性能。实证结果表明,传统欠采样虽能在短期窗口部分改善少数类识别率,但其整体性能随时间延长显著下降;相较之下,集成欠采样方法在识别能力、分类平衡性与跨期稳定性方面均表现更优。其中,HUE模型在T-1窗口下实现Gmean、AUC等指标的最优表现,并在中长期窗口中保持较高的鲁棒性,并通过Friedman显著性检验进一步验证了HUE与EasyEnsemble的性能稳健性。研究结果表明,集成欠采样策略可在极度不平衡条件下显著提升财务困境企业的识别能力与预测稳定性,为复杂金融场景下的不平衡学习提供了可验证的实证依据,也为监管机构和投资者构建更具前瞻性的风险预警体系提供了有益参考。
Abstract: Financial distress prediction (FDP) serves as an important tool for preventing systemic financial risks and improving capital allocation efficiency. However, in real-world corporate data, the proportion of distressed samples (e.g., Special Treatment “ST” firms) is extremely low, which often leads models to be dominated by majority-class samples, resulting in identification imbalance and boundary bias. To address this challenge, this study systematically investigates financial distress early warning models based on integrated under-sampling, aiming to enhance the model’s ability to identify minority-class firms and its robustness across different time windows through sample rebalancing strategies. Based on financial data of China’s A-share listed companies from 2010 to 2022, this paper constructs a comprehensive indicator system covering dimensions such as solvency, profitability, growth, and operational efficiency. Four prediction windows (T-1 to T-4) are set to examine the impact of time span on model performance. Using decision trees as the base learner, this study compares and analyzes the classification performance of traditional under-sampling methods (ENN, Tomek Link, NearMiss) and integrated under-sampling methods (RUSBoost, EasyEnsemble, HUE). Empirical results show that while traditional under-sampling can partially improve the identification rate of minority classes in short-term windows, its overall performance declines significantly over longer time horizons. In contrast, integrated under-sampling methods demonstrate superior performance in terms of recognition capability, classification balance, and cross-period stability. Among them, the HUE model achieves optimal performance in G-mean, AUC, and other metrics under the T-1 window, and maintains high robustness in medium- to long-term windows. The robustness of HUE and EasyEnsemble is further validated by Friedman significance tests. The research findings indicate that integrated under-sampling strategies can significantly enhance the identification ability and predictive stability of financial distress prediction under highly imbalanced conditions. This study provides verifiable empirical evidence for imbalanced learning in complex financial scenarios and offers valuable references for regulators and investors in constructing more forward-looking risk early warning systems.
文章引用:武珈薇, 范宏. 基于集成欠采样的企业财务困境预警方法研究[J]. 服务科学和管理, 2026, 15(1): 302-315. https://doi.org/10.12677/ssem.2026.151034

参考文献

[1] 孙洁, 李辉, 韩建光. 基于滚动时间窗口支持向量机的财务困境预测动态建模[J]. 管理工程学报, 2010(4): 174-180.
[2] 王宗胜, 尚姣姣. 我国制造业上市公司财务困境预警分析[J]. 统计与决策, 2015(3): 174-177.
[3] Fitzpatrick, P.J. (1932) A Comparison of the Ratios of Successful Industrial Enterprises with Those of Failed Companies.
[4] Ohlson, J.A. (1980) Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 18, 109-131. [Google Scholar] [CrossRef
[5] Ding, Y., Song, X. and Zen, Y. (2008) Forecasting Financial Condition of Chinese Listed Companies Based on Support Vector Machine. Expert Systems with Applications, 34, 3081-3089. [Google Scholar] [CrossRef
[6] García, V., Marqués, A.I. and Sánchez, J.S. (2019) Exploring the Synergetic Effects of Sample Types on the Performance of Ensembles for Credit Risk and Corporate Bankruptcy Prediction. Information Fusion, 47, 88-101. [Google Scholar] [CrossRef
[7] Gao, R., Cui, S., Wang, Y. and Xu, W. (2025) Predicting Financial Distress in High-Dimensional Imbalanced Datasets: A Multi-Heterogeneous Self-Paced Ensemble Learning Framework. Financial Innovation, 11, Article No. 50. [Google Scholar] [CrossRef
[8] He, H.B. and Garcia, E.A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263-1284. [Google Scholar] [CrossRef
[9] Zheng, W. and Zhao, H. (2020) Cost-Sensitive Hierarchical Classification for Imbalance Classes. Applied Intelligence, 50, 2328-2338. [Google Scholar] [CrossRef
[10] Tang, J., Li, J., Xu, W., Tian, Y., Ju, X. and Zhang, J. (2021) Robust Cost-Sensitive Kernel Method with Blinex Loss and Its Applications in Credit Risk Evaluation. Neural Networks, 143, 327-344. [Google Scholar] [CrossRef] [PubMed]
[11] Bej, S., Davtyan, N., Wolfien, M., Nassar, M. and Wolkenhauer, O. (2020) Loras: An Oversampling Approach for Imbalanced Datasets. Machine Learning, 110, 279-301. [Google Scholar] [CrossRef
[12] Sun, Z., Ying, W., Zhang, W. and Gong, S. (2024) Undersampling Method Based on Minority Class Density for Imbalanced Data. Expert Systems with Applications, 249, 123328. [Google Scholar] [CrossRef
[13] Liu, X.-Y., Wu, J.X. and Zhou, Z.-H. (2009) Exploratory Under-Sampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39, 539-550. [Google Scholar] [CrossRef] [PubMed]
[14] Ng, W.W.Y., Xu, S., Zhang, J., Tian, X., Rong, T. and Kwong, S. (2022) Hashing-Based Under-Sampling Ensemble for Imbalanced Pattern Classification Problems. IEEE Transactions on Cybernetics, 52, 1269-1279. [Google Scholar] [CrossRef] [PubMed]
[15] Gong, Y., Lazebnik, S., Gordo, A. and Perronnin, F. (2013) Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2916-2929. [Google Scholar] [CrossRef] [PubMed]