竞争风险数据分析中机器学习改进的逆概率删失加权
Improvement of Inverse Probability of Censoring Weighting in Competing Risk Data Analysis by Machine Learning
摘要: 在竞争风险数据分析中,Fine-Gray比例风险模型结合逆概率删失加权(IPCW)是常用的方法,但传统IPCW权重在处理删失时可能产生不稳定估计。为克服这一局限,本文引入一种机器学习增强的逆概率加权方法,把机器学习预测的目标事件概率作为分子纳入权重构建,并将该权重嵌入IPCW的估计方程中。最后采用Sandwich方差估计量进行统计推断。为验证该方案的可行性与稳健性,本文选取了几种主流机器学习算法来生成权重中的预测概率,基于R包中公开的数据进行实例分析,与传统IPCW与DML相比,本方法得到了有效且稳定的估计。并通过敏感性分析证实了结果的稳健性。结果表明,本方法在竞争风险数据分析中展现出了一定的应用潜力。
Abstract: In the analysis of competing risks data, the Fine-Gray proportional hazards model combined with Inverse Probability of Censoring Weighting (IPCW) is a commonly used method. However, traditional IPCW weights may lead to unstable estimates when handling censoring. To overcome this limitation, this paper introduces a machine learning-enhanced inverse probability weighting method, which incorporates the target event probability predicted by machine learning as the numerator into weight construction and embeds the resulting weights into the estimating equations of IPCW. Finally, the Sandwich variance estimator is adopted for statistical inference. To verify the feasibility and robustness of the proposed method, several mainstream machine learning algorithms are selected to generate the predicted probabilities in the weights, and a case analysis is conducted based on public data from R packages. Compared with traditional IPCW and DML, the proposed method yields valid and stable estimates. Moreover, sensitivity analysis confirms the robustness of the results. The findings indicate that this method exhibits certain application potential in the analysis of competing risks data.
文章引用:付佳琪, 侯文. 竞争风险数据分析中机器学习改进的逆概率删失加权[J]. 应用数学进展, 2026, 15(2): 22-33. https://doi.org/10.12677/aam.2026.152046

参考文献

[1] Fine, J.P. and Gray, R.J. (1999) A Proportional Hazards Model for the Subdistribution of a Competing Risk. Journal of the American Statistical Association, 94, 496-509. [Google Scholar] [CrossRef
[2] Robins, J.M., Hernán, M.Á. and Brumback, B. (2000) Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology, 11, 550-560. [Google Scholar] [CrossRef] [PubMed]
[3] Lee, B.K., Lessler, J. and Stuart, E.A. (2009) Improving Propensity Score Weighting Using Machine Learning. Statistics in Medicine, 29, 337-346. [Google Scholar] [CrossRef] [PubMed]
[4] Kvamme, H., Borgan, Ø. and Scheel, I. (2019) Time-to-Event Prediction with Neural Networks and Cox Regression. Journal of Machine Learning Research, 20, 1-30.
[5] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., et al. (2018) Double/Debiased Machine Learning for Treatment and Structural Parameters. The Econometrics Journal, 21, C1-C68. [Google Scholar] [CrossRef
[6] Stensrud, M.J., Young, J.G., Didelez, V., Robins, J.M. and Hernán, M.A. (2020) Separable Effects for Causal Inference in the Presence of Competing Events. Journal of the American Statistical Association, 117, 175-183. [Google Scholar] [CrossRef
[7] Cole, S.R. and Hernan, M.A. (2008) Constructing Inverse Probability Weights for Marginal Structural Models. American Journal of Epidemiology, 168, 656-664. [Google Scholar] [CrossRef] [PubMed]
[8] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. [Google Scholar] [CrossRef
[9] Wager, S. and Athey, S. (2018) Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. Journal of the American Statistical Association, 113, 1228-1242. [Google Scholar] [CrossRef
[10] Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[11] Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning, 20, 273-297. [Google Scholar] [CrossRef
[12] Platt, J. (1999) Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, 10, 61-74.
[13] Hornik, K., Stinchcombe, M. and White, H. (1989) Multilayer Feedforward Networks Are Universal Approximators. Neural Networks, 2, 359-366. [Google Scholar] [CrossRef