基于网络分析和随机森林方法的肝细胞癌分期研究
Staging Study of Hepatocellular Carcinoma Based on Network Analysis and Random Forest Method
摘要: 肝细胞癌(Hepatocellular Carcinoma, HCC)是一种侵袭性恶性肿瘤,尽管肝细胞癌诊断技术及治疗水平有了较大的进步,但对HCC的早期诊断依然是个巨大的挑战。在本文中,我们试图通过基因网络分析与临床分期相关的核心基因,用于对早期HCC患者的发现提供信息和提高HCC诊断技术及治疗水平。首先,我们选用GEO数据库中包含219例早期术后HCC患者的基因表达数据,进行差异表达分析,并且将数据随机分为训练集与测试集,其中训练集采用加权基因共表达网络(WGCNA)分析聚类出五个模块,对各基因模块进行功能富集和通路富集分析,我们发现其中blue模块与细胞增殖、分裂、周期以及DNA复制启动、复制、修复等生物过程相关,与细胞周期、P53信号通路、HTLV-I感染、乙型肝炎等通路相关,这些过程和通路均与HCC的发生发展密切相关。因此,选取模块的富集基因进行PPI网络分析,选取连通度较大的10个核心基因BUB1B、CCNA2、CCNB1、CCNB2、CDC20、MAD2L1、MCM4、PCNA、RFC4、TOP2A,通过随机森林对核心基因进行监督学习,建立BCLC分期的分类模型,然后应用于测试集,研究发现该方法对于BCLC早期患者的分类有很大程度的帮助,正确率达到95.52%,但是对于患者的中后期分类效果不是很理想。该研究提高了对HCC的发病机制和分期研究的认识,为HCC靶向治疗提供了新的方向。
Abstract: Hepatocellular carcinoma (HCC) is an invasive malignant tumor. Although the diagnostic techniques and treatment levels of hepatocellular carcinoma have made great progress, the early diagnosis of HCC is still a huge challenge. In this paper, we attempt to analyze core genes associated with clinical staging by gene network for information on the discovery of early HCC patients and improving the diagnostic techniques and treatment levels of HCC. First, we selected the gene expression data of 219 patients with early postoperative HCC in the GEO database, performed differential expression analysis, and randomly divided the data into training set and test set. We use the genes of training set to clustering out five modules by weighted gene co-expression network (WGCNA), and performed functional enrichment and pathway enrichment analysis for each gene module. We found that the blue module is related to some biological processes such as cell proliferation, division, cycle and DNA replication initiation, replication, repair, and this module is also related to some pathways such as cell cycle, P53 signaling pathway, HTLV-I infection, hepatitis B. These processes and pathways are closely related to the occurrence and development of HCC. Therefore, we use the enriched genes of the module for PPI network analysis, and 10 core genes that we selected with high connectivity is BUB1B, CCNA2, CCNB1, CCNB2, CDC20, MAD2L1, MCM4, PCNA, RFC4, and TOP2A. Then through the supervised learning of core genes in random forests, a classification model of BCLC staging was established and then applied to the test set. The study found that the method has a great help for the classification of early patients, and the correct rate reached 95.52%, but for the patients in the middle and late stages. The classification effect is not very good. This study raises awareness of the pathogenesis and staging of HCC. And it provides a new direction for HCC targeted therapy.
文章引用:李鑫. 基于网络分析和随机森林方法的肝细胞癌分期研究[J]. 统计学与应用, 2019, 8(1): 95-107. https://doi.org/10.12677/SA.2019.81011

参考文献

[1] El-Serag, H.B. and Rudolph, K.L. (2007) Hepatocellular Carcinoma: Epidemiology and Molecular Carcinogenesis. Gastroenterology, 132, 2557-2576. [Google Scholar] [CrossRef] [PubMed]
[2] Mikulits, W. (2018) Epithe-lial to Mesenchymal Transition in Hepatocellular Carcinoma. Future Oncology, 5, 1169.
[3] 李保国. 肝细胞癌预后相关细胞分子生物标志物研究进展[J]. 国际肿瘤学杂志, 2015, 42(5): 395-398.
[4] Kensler, T.W., Qian, G.S., Chen, J.G., et al. (2003) Translational Strategies for Cancer Prevention in Liver. Nature Reviews Cancer, 3, 321-329. [Google Scholar] [CrossRef] [PubMed]
[5] Jou, J., Choi, S.S. and Diehl, A.M. (2008) Mechanisms of Disease Progres-sion in Nonalcoholic Fatty Liver Disease. Seminars in Liver Disease, 28, 370-379. [Google Scholar] [CrossRef] [PubMed]
[6] Wallace, D.F. and Subramaniam, V.N. (2009) Co-Factors in Liver Disease: The Role of HFE-Related Hereditary Hemochromatosis and Iron. Biochimica et Biophysica Acta (BBA)/General Subjects, 1790, 663-670. [Google Scholar] [CrossRef] [PubMed]
[7] Sun, V. and Sarna, L. (2008) Symptom Management in Hepatocellular Carcinoma. Clinical Journal of Oncology Nursing, 12, 759-766. [Google Scholar] [CrossRef
[8] Tanaka, S. and Arii, S. (2010) Molecular Targeted Therapies in Hepatocellular Carcinoma. Hepatology, 48, 1312-1327.
[9] Wang, L., Tang, H., Thayanithy, V., et al. (2009) Gene Networks and microRNAs Implicated in Aggressive Prostate Cancer. Cancer Research, 69, 9490-9497. [Google Scholar] [CrossRef
[10] Horvath, S., Zhang, B., Carlson, M., et al. (2006) Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Molecular Target. Proceedings of the National Academy of Sciences of the United States of America, 103, 17402-17407. [Google Scholar] [CrossRef] [PubMed]
[11] Ivliev, A.E., ‘t Hoen, P.A.C. and Sergeeva, M.G. (2010) Coexpression Network Analysis Identifies Transcriptional Modules Related to Proastrocytic Differentiation and Sprouty Signaling in Glioma. Cancer Research, 70, 10060-10070. [Google Scholar] [CrossRef
[12] Bolstad, B.M., Irizarry, R.A., Åstrand, M., et al. (2003) A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias. Bi-oinformatics, 19, 185-193. [Google Scholar] [CrossRef] [PubMed]
[13] Smyth, G.K. (2005) Limma: Linear Models for Microarray Data. Bioinformatics & Computational Biology Solutions Using R & Bioconductor, 397-420.
[14] 王攀. 加权基因共表达网络分析(WGCNA)在食管鳞癌中的应用[D]: [博士学位论文]. 北京: 北京协和医学院中国医学科学院; 北京协和医学院; 中国医学科学院; 清华大学医学部, 2014.
[15] Langfelder, P. and Horvath, S. (2008) WGCNA: An R package for Weighted Correlation Network Analysis. BMC Bioinformatics, 9, 559. [Google Scholar] [CrossRef] [PubMed]
[16] 宋长新, 雷萍, 王婷. 基于WGCNA算法的基因共表达网络构建理论及其R软件实现[J]. 基因组学与应用生物学, 2013, 32(1): 135-141.
[17] Kandaswamy, K.K., Chou, K.C., Martinetz, T., et al. (2011) AFP-Pred: A Random Forest Approach for Predicting Antifreeze Proteins from Se-quence-Derived Properties. Journal of Theoretical Biology, 270, 56-62. [Google Scholar] [CrossRef] [PubMed]
[18] 武晓岩, 李康. 随机森林方法在基因表达数据分析中的应用及研究进展[J]. 中国卫生统计, 2009, 26(4): 437-440.
[19] Langfelder, P., Zhang, B. and Horvath, S. (2008) Defining Clusters from a Hierarchical Cluster Tree: The Dynamic Tree Cut Package for R. Bioinformatics, 24, 719-720. [Google Scholar] [CrossRef] [PubMed]
[20] Marr, D. (1982) Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Quarterly Review of Biology, 8.
[21] 李敏, 陈建二, 王建新. 基于复杂网络理论的PPI网络拓扑分析[J]. 计算机工程与应用, 2008, 44(8): 20-22.
[22] Saito, R., Smoot, M.E., Ono, K., et al. (2012) A Travel Guide to Cytoscape Plugins. Nature Methods, 9, 1069-1076. [Google Scholar] [CrossRef] [PubMed]
[23] 周慧蕾. CCNB1和CCNA2在人类正常邻近组织和肺癌中不同功能激活及抑制转换机制与网络构建[D]: [硕士学位论文]. 北京: 北京邮电大学, 2015.
[24] 李立人, 施公胜, 孙超. PCNA和VEGF在肝细胞肝癌中的表达意义[J]. 世界华人消化杂志, 2005, 13(4): 560-561.
[25] 华骁帆. 早期非特殊性浸润性乳腺癌TOP2a蛋白表达与分级、分期及分子分型相关性分析[D]: [硕士学位论文]. 苏州: 苏州大学, 2016.
[26] 彭绍华, 杨剑锋, 谢平平, 等. 细胞周期蛋白在肝细胞癌组织中的表达及其与肿瘤细胞凋亡的关系[J]. 癌症, 2005, 24(6): 695-698.