基于住院病历的结直肠息肉患病风险预测模型构建
Construction of Risk Prediction Model for Colorectal Polyps Morbidity Based on Hospitalized Medical Records
DOI: 10.12677/acm.2026.1652185, PDF,   
作者: 曾 萍:珠海市中西医结合医院消化内科,广东 珠海
关键词: 结直肠息肉风险预测住院病历Logistic回归Colorectal Polyps Risk Prediction Hospitalization Medical Records Logistic Regression
摘要: 目的:利用住院病历常规数据构建简洁、可解释的结直肠息肉患病风险预测模型,为临床提供无创、低成本的内镜筛查决策工具。方法:回顾性纳入2023年1月~2025年5月珠海市中西医结合医院80例首次接受结肠镜检查的住院患者,以病理确诊“是否存在息肉”为结局。通过ETL脚本自动提取入院24 h内人口学、症状、实验室及用药信息,采用LASSO (Least Absolute Shrinkage and Selection Operator)回归进行变量筛选以应对小样本量下的过拟合问题,并构建多因素Logistic回归模型。经共线性诊断与变量筛选后,以AUC、校准度及临床决策曲线评价模型性能。对缺失数据,采用多重插补法(m = 5)进行处理,并详细记录了缺失变量的分布与插补策略。结果:51例(63.75%)检出息肉。多因素分析最终保留4个变量:年龄、BMI、便血及癌胚抗原(CEA)。模型AUC = 0.92,Hosmer-Lemeshow P = 0.469,回归方程为Logit (P) = −25.42 + 0.11 × 年龄 + 0.73 × BMI + 2.39 × 便血 + 1.12 × CEA。结论:基于住院常规资料的4因子Logistic模型预测效能良好、校准度高,无需额外检测即可实现“一键式”风险计算,适合嵌入HIS系统辅助内镜排程,并可为结直肠癌一级预防提供可操作工具。
Abstract: Objective: To construct a simple and interpretable risk prediction model for colorectal polyps by using routine data of inpatient medical records, and to provide a noninvasive and low-cost decision-making tool for endoscopic screening in clinic. Methods: From January 2023 to May 2025, 80 inpatients who received colonoscopy for the first time in Zhuhai Hospital of Integrated Traditional Chinese and Western Medicine were retrospectively included, and the pathological diagnosis was “whether there were polyps”. The demographic, symptom, laboratory and medication information within 24 hours after admission was automatically extracted by ETL script. The variables were screened by LASSO (Least Absolute Shrinkage and Selection Operator) regression to deal with the over-fitting problem under small sample size, and a multi-factor Logistic regression model was constructed. After collinearity diagnosis and variable screening, the model performance was evaluated by AUC, calibration and clinical decision curve. The missing data are processed by multiple interpolation (m = 5), and the distribution and interpolation strategy of missing variables are recorded in detail. Results: Polyps were detected in 51 cases (63.75%). Multivariate analysis finally retained four variables: age, BMI, hematochezia and carcinoembryonic antigen (CEA). AUC = 0.092, Hosmer-Lemeshow P = 0.469, and the regression equation is logit (P) = −25.42 + 0.11 × age + 0.73 × BMI + 2.39 × hematochezia + 1.12 × CEA. Conclusion: The 4-factor Logistic model based on routine hospitalization data has good prediction efficiency and high calibration, and can realize “one-button” risk calculation without additional detection, which is suitable for being embedded in HIS system to assist endoscopic scheduling, and can provide an operational tool for primary prevention of colorectal cancer.
文章引用:曾萍. 基于住院病历的结直肠息肉患病风险预测模型构建[J]. 临床医学进展, 2026, 16(5): 3611-3616. https://doi.org/10.12677/acm.2026.1652185

参考文献

[1] 农云翠, 黄小知, 黄灵旭, 等. 结直肠息肉发生的影响因素分析[J]. 广西医学, 2025, 47(7): 962-967.
[2] 王人杰, 张晓兰, 蔡继东, 等. 结直肠息肉的规范化诊疗[J]. 中华胃肠外科杂志, 2024, 27(6): 583-590.
[3] 中华医学会消化病学分会医工交叉协作组. 结直肠息肉门诊管理专家共识(2025, 成都) [J]. 中华消化内镜杂志, 2025, 42(5): 337-347.
[4] 张庆林, 郑雯, 殷刚刚, 等. 胆囊息肉对结直肠息肉提示价值的相关性研究[J]. 中华消化内镜杂志, 2025, 42(3): 223-228.
[5] 胡堃苗, 毕玉珍, 余振华, 等. 结直肠息肉内镜治疗后迟发性出血的相关危险因素及风险预测模型构建[J]. 浙江创伤外科, 2025, 30(2): 315-318.
[6] 吕莹莹, 朱炳喜. 结直肠息肉高危人群早期筛查评分模型的建立[J]. 医学研究杂志, 2019, 48(8): 132-136.
[7] 梁晗. 2型糖尿病与结直肠息肉相关性的临床研究[D]: [硕士学位论文]. 开封: 河南大学, 2021.
[8] 刘波, 张慧华, 张慧晖, 等. 儿童结直肠息肉1351例的临床特征及内镜下治疗效果分析[J]. 中国当代儿科杂志, 2022, 24(4): 354-359.
[9] 闫明海, 赵延延, 刘鑫, 等. 基于回归或机器学习方法的个体预后或诊断的多变量预测模型透明报告(TRIPOD + AI)解读[J]. 中华内科杂志, 2025, 64(1): 4-10.
[10] 程军, 汪龙, 张冠军, 等. 磺脲类降糖药与抗菌药物潜在不良药物相互作用的处方分析[J]. 医药导报, 2022, 41(5): 708-712.
[11] Deiss-Yehiely, N., Graffy, P.M., Weigman, B., Hassan, C., Matkowskyj, K.A., Pickhardt, P.J., et al. (2022) Detection of High-Risk Sessile Serrated Lesions: Multitarget Stool DNA versus CT Colonography. American Journal of Roentgenology, 218, 670-676. [Google Scholar] [CrossRef] [PubMed]
[12] 孙亚清. Logistic回归样本量确定所需自变量事件数的模拟研究[D]: [硕士学位论文]. 广州: 南方医科大学, 2016.