基于Cox比例风险回归模型、LASSO与生存树的乳腺癌预后
Prognosis of Breast Cancer Based on Cox Proportional Hazards Regression Model, LASSO and Survival Tree
摘要: 传统的病理检查方法不足以预测乳腺癌的治疗结果,因此从分子生物学上研究其发病机制具有重要意义。通过对乳腺癌患者复发风险的预测,高风险标记的肿瘤患者可以从辅助治疗中获益,而低风险标记的患者可免遭不必要的治疗。本文分别对ER+乳腺癌和ER−乳腺癌的基因芯片数据进行分析,采用单因素Cox比例风险回归模型初步筛选基因,然后进一步使用LASSO方法对基因进行筛选,再利用这些基因通过生存树方法对患者进行预测和分类。本文使用Kaplan-meier曲线及对数秩检验对结果进行验证。本文的模型对乳腺癌复发风险具有良好的预测效果,所筛选出的基因部分已被相关文献报道其确实与乳腺癌的发生和发展密切相关,其它基因尚需进一步实验来验证其在乳腺癌中发挥的作用。
Abstract: Traditional pathological examination methods are not enough to predict the treatment outcome of breast cancer. Therefore, it is of great significance to study the pathogenesis of breast cancer by molecular biology. By predicting the risk of recurrence in patients with breast cancer, high-risk cancer patients can benefit from adjuvant therapy, while low-risk cancer patients can be protected from unnecessary treatment. The microarray data of ER+ breast cancer and ER− breast cancer were analyzed in this paper. Univariate Cox proportional hazards regression mode was used to preliminary screening the genes, then the LASSO was further used to screen the genes and applied the genes to the survival tree for prediction and classification, Kaplan-meier curve and log-rank test were used to prove the validity of the result. The model in this paper has a good prediction effect in the classification of breast cancer patients. Some of the genes we screened have been reported in the relevant literature, indicating that it is closely related to the occurrence and development of breast cancer. Other genes need further experiments to verify the role they play in breast cancer.
文章引用:王莉, 张娟. 基于Cox比例风险回归模型、LASSO与生存树的乳腺癌预后[J]. 统计学与应用, 2018, 7(2): 99-110. https://doi.org/10.12677/SA.2018.72013

参考文献

[1] 蒋定锋, 高峻, 赵耐青. 乳腺癌基因芯片数据分析[J]. 复旦学报(医学版), 2005, 32(2): 169-172.
[2] Higa, G.M. and Fell, R.G. (2013) Sex Hormone Receptor Repertoire in Breast Cancer. International Journal of Breast Cancer, 2013, 284036.
[3] 陈慧, 莫淋, 徐晓帆, 等. 雌激素受体阳性乳腺癌预后的相关因素分析[J]. 临床肿瘤学杂志, 2015, 20(4): 333-337.
[4] 刘宁. 乳腺癌基因分型的研究进展[J]. 中国普通外科杂志, 2010, 19(5): 556-559.
[5] Bolstadm B,M,, Irizarrym R,A,, Astrandm M,, et al. (2003) A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias. Bioinformatics, 19, 185-193. [Google Scholar] [CrossRef] [PubMed]
[6] Hsiao, L.L. (2007) A Five-Gene Signature and Clinical Outcome in Non-Small-Cell Lung Cancer. New England Journal of Medicine, 356, 11. [Google Scholar] [CrossRef
[7] Cox, D.R. (1992) Regression Models and Life-Tables. Breakthroughs in Statistics. Springer, New York, 187-220. [Google Scholar] [CrossRef
[8] 李元章, 何春雄. 实用生存模型: 不完全数据分析[M]. 广州: 华南理工大学出版社, 2015: 95-103.
[9] Warnke, R. (2004) Prediction of Survival in Diffuse Large-B-Cell Lymphoma Based on the Expression of Six Genes. The New England Journal of Medicine, 350, 1828-1837. [Google Scholar] [CrossRef
[10] Beer, D.G., Kardia, S.L., Huang, C.C., et al. (2006) Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma. The Journal of Evidence-Based Medicine, 8, 816-824.
[11] Tibshirani, R. (1997) The Lasso Method for Variable Selection in the Cox Model. Statistics in Medicine, 16, 385-395. [Google Scholar] [CrossRef
[12] 闫丽娜, 覃婷, 王彤. LASSO方法在Cox回归模型中的应用[J]. 中国卫生统计, 2012, 29(1): 58-60, 64.
[13] Simon, N., Friedman, J., Hastie, T., et al. (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39, 1. [Google Scholar] [CrossRef] [PubMed]
[14] Gordon, L. and Olshen, R.A. (1985) Tree-Structured Survival Analysis. Cancer Treatment Reports, 69, 1065-1069.
[15] 郎素平, 余红梅, 王彤, 等. 生存树方法及其在预后分析中的应用[J]. 中国卫生统计, 2006, 23(1): 13-15.
[16] Atkinson, E.J. and Therneau, T.M. (2000) An Introduction to Recursive Partitioning Using the RPART Routines. Rochester Mayo Foundation.
[17] Zhang, C., Zhu, C., Chen, H., et al. (2010) Kif18A Is Involved in Human Breast Carcinogenesis. Carcinogenesis, 31, 1676-1684. [Google Scholar] [CrossRef] [PubMed]
[18] Werner, S., Borgmann, K., Pantel, K., et al. (2016) Abstract 2733: Novel Function of the RAI2 Protein in Genomic Integrity of Breast Cancer Cells. Cancer Research, 76, 2733-2733.
[19] Kabraji, S., Sole, X., Ying, H., et al. (2017) AKT1low Quiescent Cancer Cells Persist after Ne-oadjuvant Chemotherapy in Triple Negative Breast Cancer. Breast Cancer Research, 19, 88.
[20] Spears, M., Cunningham, C.A., Taylor, K.J., et al. (2012) Proximity Ligation Assays for Isoform-Specific Akt Activation in Breast Cancer Identify Activated Akt1 as a Driver of Progression. Journal of Pathology, 227, 481-489. [Google Scholar] [CrossRef] [PubMed]
[21] Erin, N., Podnos, A., Tanriover, G., et al. (2015) Bidirectional Effect of CD200 on Breast Cancer Development and Metastasis, with Ultimate Outcome Determined by Tumor Aggressiveness and a Cancer-Induced Inflammatory Response. Oncogene, 34, 3860-3870. [Google Scholar] [CrossRef] [PubMed]
[22] Moullan, N., Cox, D.G., Angele, S., et al. (2003) Polymorphisms in the DNA Repair Gene XRCC1, Breast Cancer Risk, and Response to Radiotherapy. Cancer Epidemiology, Biomarkers & Prevention, 12, 1168-1174.