多种机器学习算法在甲状腺癌复发风险评估中的对比分析
Comparative Analysis of Multiple Machine Learning Algorithms for Risk Assessment of Thyroid Cancer Recurrence
DOI: 10.12677/aam.2026.152070, PDF,   
作者: 丁光琰, 李 敏*:青岛大学数学与统计学院,山东 青岛
关键词: 甲状腺癌复发预测机器学习集成学习Thyroid Cancer Recurrence Prediction Machine Learning Ensemble Learning
摘要: 甲状腺癌术后复发风险的准确评估对改善患者预后及优化医疗资源配置具有重要意义。针对传统统计学方法在处理复杂非线性临床数据时的局限性,本文旨在探讨并比较多种机器学习算法在甲状腺癌复发预测中的应用价值。本文采用Borzooei和Tarokhian提供的临床数据集,包含17项特征变量,在对数据进行清洗及Label Encoding数值化处理后,按3:1比例划分为训练集与测试集,构建了逻辑回归、K近邻、决策树、支持向量机(SVM)以及随机森林、XGBoost、CatBoost共七种机器学习模型。通过受试者工作特征曲线(ROC)、曲线下面积(AUC)以及准确度、灵敏度、特异度等这一多维指标体系,全面评估各模型的分类性能。实验结果显示,七种模型均表现出优异的预测性能,AUC值均超过0.93,准确度均高于0.89。其中,集成学习算法表现最为突出:随机森林(Random Forest)以0.9904的最高AUC值展现了最优的泛化能力;XGBoost与CatBoost在整体准确度上并列第一(0.9375),且XGBoost在特异度(0.9000)上表现最佳。特征分析进一步揭示,风险等级(Risk)、治疗反应(Response)及TNM分期是影响复发预测的核心指标。机器学习技术,特别是以随机森林和XGBoost为代表的集成学习算法,能有效提升甲状腺癌复发风险预测的准确性,该模型可作为一种客观、高效的辅助诊断工具,为临床医生制定个性化随访策略提供科学依据。
Abstract: Accurate assessment of postoperative recurrence risk in thyroid cancer is of great significance for improving patient prognosis and optimizing medical resource allocation. Addressing the limitations of traditional statistical methods in handling complex nonlinear clinical data, this paper aims to explore and compare the application value of various machine learning algorithms in predicting thyroid cancer recurrence. This study utilizes the clinical dataset provided by Borzooei and Tarokhian, comprising 17 feature variables. After data cleaning and numerical processing via Label Encoding, the data was split into training and testing sets at a 3:1 ratio. Seven machine learning models were constructed, including Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, Support Vector Machine (SVM), Random Forest, XGBoost, and CatBoost. The classification performance of each model was comprehensively evaluated using a multidimensional metric system involving Receiver Operating Characteristic (ROC) curve, Area Under the Curve (AUC), accuracy, sensitivity, and specificity. Experimental results demonstrated that all seven models exhibited excellent predictive performance, with AUC values exceeding 0.93 and accuracy surpassing 0.89. Notably, ensemble learning algorithms performed the best: Random Forest demonstrated optimal generalization ability with the highest AUC of 0.9904; XGBoost and CatBoost tied for the highest overall accuracy (0.9375), with XGBoost achieving the best specificity (0.9000). Feature analysis further revealed that Risk level, Response to therapy, and TNM stage were the core predictors affecting recurrence. In conclusion, machine learning techniques, particularly ensemble learning algorithms represented by Random Forest and XGBoost, can effectively improve the accuracy of thyroid cancer recurrence risk prediction. These models can serve as objective and efficient auxiliary diagnostic tools, providing a scientific basis for clinicians to formulate personalized follow-up strategies.
文章引用:丁光琰, 李敏. 多种机器学习算法在甲状腺癌复发风险评估中的对比分析[J]. 应用数学进展, 2026, 15(2): 293-301. https://doi.org/10.12677/aam.2026.152070

参考文献

[1] 贾林梓. 认识甲状腺疾病为健康护航[J]. 健康向导, 2024, 30(2): 1.
[2] Sidey-Gibbons, J.A.M. and Sidey-Gibbons, C.J. (2019) Machine Learning in Medicine: A Practical Introduction. BMC Medical Research Methodology, 19, Article No. 64. [Google Scholar] [CrossRef] [PubMed]
[3] 司锐, 李文秀, 苏俊武. 人工智能在医学领域的应用进展[J]. 中国医药, 2021, 16(6): 957-960.
[4] 于帆, 何海洪, 周义文. 人工智能在检验医学领域的应用进展[J]. 国际检验医学杂志, 2023, 44(18): 2267-2273.
[5] 黄浩然. 基于机器学习的心血管疾病预测研究[D]: [硕士学位论文]. 武汉: 湖北大学, 2024.
[6] 文宏伟, 陆菁菁, 何晖光. 机器学习在神经精神疾病诊断及预测中的应用[J]. 协和医学杂志, 2018, 9(1): 19-24.
[7] 王新光. 机器学习在结石性肾积脓术前诊断及PCNL术后SIRS预测方面的应用研究[D]: [博士学位论文]. 武汉: 华中科技大学, 2021.
[8] 孙悦, 夏宁, 戴玮然. 基于机器学习算法探讨甲状腺相关性眼病的免疫相关基因[J]. 广西医学, 2023, 45(10): 1200-1207.
[9] 卢江昆, 胡纪杨, 龚建鸣等. 机器学习模型预测甲状腺结节良恶性分析[J]. 山西医药杂志, 2021, 50(20): 2899-2901.
[10] 易捷伊. 机器学习在甲状腺结节良恶性诊断中的辅助分析[D]: [硕士学位论文]. 昆明: 云南大学, 2018.
[11] 马明瑞, 马晓剑, 冷晓玲. 基于机器学习的微灶甲状腺乳头状癌超声智能诊断方法探析[J]. 中国医疗设备, 2019, 34(S2): 171-173.
[12] 周天晗, 吴凡, 陆凯宁, 等. 基于机器学习算法预测甲状腺乳头状癌右喉返神经后方淋巴结转移907例临床研究[J]. 中国实用外科杂志, 2021, 41(12): 1394-1399.
[13] 王子柯. 基于机器学习的甲状腺乳头状癌临床数据分析与诊断模型研究[D]: [硕士学位论文]. 柳州: 广西科技大学, 2022.
[14] Borzooei, S., Briganti, G., Golparian, M., Lechien, J.R. and Tarokhian, A. (2023) Machine Learning for Risk Stratification of Thyroid Cancer Patients: A 15-Year Cohort Study. European Archives of Oto-Rhino-Laryngology, 281, 2095-2104. [Google Scholar] [CrossRef] [PubMed]
[15] Lee, J., Lee, S.G., Kim, K., Yim, S.H., Ryu, H., Lee, C.R., et al. (2019) Clinical Value of Lymph Node Ratio Integration with the 8th Edition of the UICC TNM Classification and 2015 ATA Risk Stratification Systems for Recurrence Prediction in Papillary Thyroid Cancer. Scientific Reports, 9, Article No. 13361. [Google Scholar] [CrossRef] [PubMed]