基于Stacking集成学习与SHAP的心力衰竭风险优化预测及可解释性研究
Heart Failure Risk Assessment Based on Stacking Ensemble Learning and SHAP Explainable Analysis
摘要: 心力衰竭作为各类心血管疾病发展的终末阶段,其高致死率和复杂的预后评估已成为临床医学面临的重大挑战。传统的机器学习模型往往面临预测精度与可解释性之间的权衡困境,限制了其在严肃医疗决策中的应用。本研究提出了一种融合Stacking集成学习与SHAP(SHapley Additive exPlanations)深度可解释性分析的心力衰竭风险优化预测框架。首先,通过引入“Age × r”等医学交互特征并应用合成少数类过采样技术(SMOTE)进行数据治理,从源头上保障了模型对高危样本的表征能力;其次,设计并实现了一种基于梯度提升决策树(GBDT)、随机森林(RF)与支持向量机(SVC)的异构Stacking集成架构。为克服小样本下的多重共线性问题,本研究创新性地采用带有L2正则化的岭回归(Ridge Classifier)作为元学习器,并引入特征透传(Passthrough)机制。通过消融实验证实,各核心模块的协同作用有效修正了单一算法的预测偏差,并大幅提升了模型对少数高危类的捕捉能力。实验结果表明,该集成模型在准确率(Accuracy)、F1分数等综合评价指标上均优于单体基准算法,并维持了极高的灵敏度(Recall),极大降低了临床漏诊率;最后,通过SHAP解释技术实现了从全局特征贡献到个体归因的全链路透明化,验证了算法决策与医学逻辑的深度对齐。考虑到本研究基于小样本、单中心数据的局限性,本研究不仅为心衰风险评估提供了高精度的量化工具,更为闭环式智能医疗决策支持系统的探索提出了一个可行的技术框架。
Abstract: As the terminal stage of various cardiovascular diseases, Heart failure(HF) presents a significant clinical challenge due to its high mortality rate and the complexity of prognosis assessment. Traditional machine learning models often struggle with the trade-off between predictive accuracy and interpretability, which limits their deployment in critical medical decision-making scenarios. This study proposes an optimized heart failure risk prediction framework that integrates Stacking ensemble learning with SHAP (SHapley Additive exPlanations) deep explainable analysis. Firstly, feature engineering was performed by introducing medical interaction features such as “Age × r” and applying the Synthetic Minority Over-sampling Technique (SMOTE) for data governance, ensuring the model’s capability to characterize high-risk samples. Secondly, a heterogeneous Stacking ensemble architecture based on Gradient Boosting Decision Tree (GBDT), Random Forest (RF), and Support Vector Machine (SVC) was designed and implemented. To overcome multi-collinearity issues under small sample conditions, a Ridge Classifier with L2 regularization was innovatively adopted as the meta-learner, combined with a feature passthrough mechanism. Ablation studies confirmed that the synergistic effect of these core modules effectively corrects predictive biases inherent in individual algorithms and significantly enhances the model’s ability to capture minority class instances. Experimental results demonstrate that the proposed ensemble model outperforms single baseline algorithms in comprehensive evaluation metrics such as Accuracy and F1-Score, while maintaining extremely high Sensitivity (Recall), substantially reducing clinical misdiagnosis rates. Finally, SHAP interpretation technology was employed to achieve full-stack transparency, ranging from global feature contribution to individual case attribution, thus validating the deep alignment between algorithmic decisions and medical logic. Considering the limitations of small-sample and single-center data, this research not only provides a high-precision quantitative tool for HF risk assessment but also proposes a feasible technical framework for the exploration of closed-loop intelligent medical decision support systems.
文章引用:许文浩, 尹航. 基于Stacking集成学习与SHAP的心力衰竭风险优化预测及可解释性研究[J]. 计算机科学与应用, 2026, 16(6): 277-288. https://doi.org/10.12677/csa.2026.166227

参考文献

[1] 顾东风, 黄广勇, 吴锡桂, 等. 中国心力衰竭流行病学调查及其患病率[J]. 中华心血管病杂志, 2003(1): 6-9.
[2] 慢性心力衰竭诊断治疗指南[J]. 中华心血管病杂志, 2007, 35(12): 1076-1095.
[3] 梁书彤, 郭茂祖, 赵玲玲. 基于机器学习的医疗决策支持系统综述[J]. 计算机工程与应用, 2019, 55(19): 1-11.
[4] 王永威, 魏德健, 曹慧, 等. 深度学习在心力衰竭检测中的应用综述[J]. 计算机科学与探索, 2025, 19(1): 65-78.
[5] Fensore, C., Deshpande, A., Carrillo-Larco, R.M., Patel, S.A. and Ho, J.C. (2026) Beyond Composite Indices: Comprehensive Social Determinants Improve Heart Failure Readmission Prediction. Journal of the American Heart Association, 15, e043735. [Google Scholar] [CrossRef
[6] 童睿, 阚丽虹, 朱中生. 基于Logistic回归和随机森林的心力衰竭预后预测建模[J]. 复旦学报(医学版), 2022, 49(5): 656-664.
[7] 王海燕, 焦增晨, 赵剑, 等. 基于Bayes超参数优化梯度提升树的心脏病预测方法[J]. 吉林大学学报(理学版), 2025, 63(2): 472-478.
[8] 赵金超, 李仪, 王冬, 等. 基于优化的随机森林心脏病预测算法[J]. 青岛科技大学学报(自然科学版), 2021, 42(2): 112-118.
[9] 谭朋柳, 徐光勇, 张露玉, 等. 基于卷积神经网络和Adaboost的心脏病预测模型[J]. 计算机应用, 2023, 43(S1): 19-25.
[10] 庞显涛. 基于BP神经网络的心脏病预测研究与实现[D]: [硕士学位论文]. 长春: 吉林大学, 2012.
[11] 刘宇, 乔木. 基于聚类和XGboost算法的心脏病预测[J]. 计算机系统应, 2019, 28(1): 228-232.
[12] 苏枫, 张少衡, 陈楠楠, 等. 基于机器学习分类判断算法构建心力衰竭疾病分期模型[J]. 中国组织工程研究, 2014, 18(49): 7938-7942.
[13] 刘婕, 郝舒欣, 万红燕, 等. 自动化机器学习在疾病预测中的应用: 以心脏病预测为例[J]. 中国卫生统计, 2026, 43(2): 285-290.
[14] 刘淘涛. 基于CNN和Transformer的心血管疾病识别算法研究[D]: [硕士学位论文]. 长春: 吉林大学, 2022.
[15] 王洁, 李金泽, 王子曈, 等. 改进鲸鱼优化LightGBM的可解释性心脏病风险预测模型[J]. 北京邮电大学学报, 2023, 46(6): 39-45.
[16] 孙岩. 贝叶斯网络结构学习算法研究与应用[D]: [博士学位论文]. 大连: 大连理工大学, 2010.
[17] 巨荣辉. 基于深度学习和医疗数据的疾病提前诊断和风险预测方法研究[D]: [硕士学位论文]. 武汉: 华中科技大学, 2018.