SRF-LDA:基于堆叠集成学习的LncRNA与疾病关联预测方法
SRF-LDA: A Stacking-Based Ensemble Learning Model for LncRNA-Disease Association Prediction
DOI: 10.12677/hjcb.2023.134004, PDF,   
作者: 孙 捷:大连交通大学理学院,辽宁 大连;谭者斌:大连交通大学软件学院,辽宁 大连
关键词: lncRNA疾病lncRNA-疾病关联随机森林变量重要性特征选择支持向量机LncRNA Disease LncRNA-Disease Association Random Forest Variable Importance Feature Selection Support Vector Machine
摘要: 长链非编码RNA (lncRNA)是一类长度大于200 nt的非编码RNA,是非编码基因组的重要组成部分。大量实验证实,lncRNA与人类疾病的发生发展密不可分,但除了一小部分的lncRNA与人类疾病关系已知之外,大多数的lncRNA与人类疾病的关系仍然有待研究,因此准确识别与疾病有关的lncRNA有助于研究lncRNA在疾病中的作用机制,探索治疗疾病的新方法。在本研究中,为了提高对LDA的预测能力,我们实现了一种基于堆叠集成学习的LDA预测模型(简称SRFLDA)。在SRFLAD中,第一部分通过整合lncRNA的K-mer、疾病的高斯相互作用谱核相似性及已知lncRNA-疾病关联(LDA)三种类型的特征作为融合特征输入模型。第二部分使用堆叠集成学习策略通过组合多个不同参数的随机森林分类器作为基模型进行特征分类,并使用支持向量机作为元模型对随机森林的分类结果进行组合优化,从而得到更准确、鲁棒的LDA预测结果。第三部分通过十倍交叉验证对模型进行训练评价。结果表明该方法在预测LDA方面具有较好的性能,平均AUC的值为0.9246,平均AUPR值为0.9166,预测效果优于其他几种现有的LDA预测模型。
Abstract: Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs larger than 200 nt in length and are an important component of the non-coding genome. A large number of experiments have confirmed that lncRNA is inseparable from the occurrence and development of human diseases, but except for a small number of lncRNAs with human diseases, the relationship between most lncRNAs and human diseases still needs to be studied, so accurate identification of lncRNAs related to diseases is helpful to study the mechanism of action of lncRNAs in diseases and explore new ways to treat diseases. In this study, in order to improve the prediction ability of LDA, we implemented an LDA prediction model based on stacked ensemble learning (SRFLDA). In SRFLAD, the first part is used to integrate three types of features of lncRNA, namely K-mer, Gaussian interaction spectral nuclear similarity of disease, and known lncRNA-disease association (LDA), as fusion features as input into the model. In the second part, the stacked ensemble learning strategy is used to classify features by combining random forest classifiers with multiple different parameters as the base model, and the support vector machine is used as a metamodel to combine and optimize the classification results of the random forest, so as to obtain more accurate and robust LDA prediction results. The third part is to evaluate the training of the model through tenfold cross-validation. The results show that the proposed method has good performance in predicting LDA, with an average AUC value of 0.9246 and an average AUPR value of 0.9166, which is better than that of several other existing LDA prediction models.
文章引用:孙捷, 谭者斌. SRF-LDA:基于堆叠集成学习的LncRNA与疾病关联预测方法[J]. 计算生物学, 2023, 13(4): 35-44. https://doi.org/10.12677/hjcb.2023.134004

参考文献

[1] Yang, G.D., Lu, X.Z. and Yuan, L.J. (2014) LncRNA: A Link between RNA and Cancer. Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms, 1839, 1097-1109. [Google Scholar] [CrossRef] [PubMed]
[2] Wapinski, O. and Chang, H.Y. (2011) Long Noncoding RNAs and Human Disease. Trends in Cell Biology, 21, 354-361. [Google Scholar] [CrossRef] [PubMed]
[3] Panwar, B., Arora, B. and Raghava, G.P. (2014) Prediction and Classification of ncRNAs Using Structural Information. BMC Genomics, 15, Article No. 127. [Google Scholar] [CrossRef] [PubMed]
[4] Lu, Q., Ren, S., Lu, M., Zhang, Y., Zhu, D., Zhang, X. and Li, T. (2013) Computational Prediction of Associations between Long Non-Coding RNAs and Proteins. BMC Genomics, 14, Article No. 651. [Google Scholar] [CrossRef] [PubMed]
[5] Saldana-Meyer, R., et al. (2019) RNA Interactions Are Essential for CTCF-Mediated Genome Organization. Molecular Cell, 76, 412-422e415. [Google Scholar] [CrossRef] [PubMed]
[6] Chen, L.L. and Carmichael, G.G. (2009) Altered Nuclear Retention of mRNAs Containing Inverted Repeats in Human Embryonic Stem Cells: Functional Role of a Nuclear Noncoding RNA. Molecular Cell, 35, 467-478. [Google Scholar] [CrossRef] [PubMed]
[7] Clemson, C.M., et al. (2009) An Architectural Role for a Nuclear Noncoding RNA: NEAT1 RNA Is Essential for the Structure of Paraspeckles. Molecular Cell, 33, 717-726. [Google Scholar] [CrossRef] [PubMed]
[8] Sasaki, Y.T., Ideue, T., Sano, M., Mituyama, T. and Hirose, T. (2009) MENepsilon/Beta Noncoding RNAs Are Essential for Structural Integrity of Nuclear Paraspeckles. Proceedings of the National Academy of Sciences of the United States of America, 106, 2525-2530. [Google Scholar] [CrossRef] [PubMed]
[9] Salmena, L., Poliseno, L., Tay, Y., Kats, L. and Pandolfi, P.P. (2011) A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? Cell, 146, 353-358. [Google Scholar] [CrossRef] [PubMed]
[10] Zhang, X., Wang, W., Zhu, W., Dong, J., Cheng, Y., Yin, Z. and Shen, F. (2019) Mechanisms and Functions of Long Non-Coding RNAs at Multiple Regulatory Levels. International Journal of Molecular Sciences, 20, Article No. 5573. [Google Scholar] [CrossRef] [PubMed]
[11] Chen, X. and Yan, G.Y. (2013) Novel Human lncRNA-Disease Association Inference Based on lncRNA Expression Profiles. Bioinformatics, 29, 2617-2624. [Google Scholar] [CrossRef] [PubMed]
[12] Zhou, M., Wang, X., Li, J., et al. (2013) Prioritizing Candidate Disease-Related Long Non-Coding RNAs by Walking on the Heterogeneous lncRNA and Disease Network. Molecular BioSystems, 11, 760-769. [Google Scholar] [CrossRef
[13] Xuan, P., Pan, S., Zhang, T., Liu, Y. and Sun, H. (2019) Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations. Cells, 8, Article No. 1012. [Google Scholar] [CrossRef] [PubMed]
[14] Xuan, P., Cao, Y., Zhang, T., Kong, R. and Zhang, Z. (2019) Dual Convolutional Neural Networks with Attention Mechanisms Based Method for Predicting Disease-Related lncRNA Genes. Frontiers in Genetics, 10, Article No. 416. [Google Scholar] [CrossRef] [PubMed]
[15] Zeng, M., Lu, C., Fei, Z., Wu, E., Li, Y., Wang, J. and Li, M. (2020) Dm-flda: A Deep Learning Framework for Predicting incRNA-Disease Associations. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18, 2353-2363. [Google Scholar] [CrossRef
[16] Zhang, Y., Ye, F. and Gao, X. (2021) MCA-Net: Multi-Feature Coding and Attention Convolutional Neural Network for Predicting lncRNA-Disease Association. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19, 2907-2919. [Google Scholar] [CrossRef
[17] Wei, H., Liao, Q. and Liu, B. (2020) iLnRNADIS-FB: Identify lncRNA-Disease Associations by Fusing Biological Feature Blocks through Deep Neural Network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18, 1946-1957. [Google Scholar] [CrossRef
[18] Lan, W., Li, M., Zhao, K., et al. (2017) LDAP: A Web Server for lncRNA-Disease Association Prediction. Bioinformatics, 33, 458-460. [Google Scholar] [CrossRef] [PubMed]
[19] Xie, G.B., Meng, T.F., Luo, Y. and Liu, Z.G. (2019) SKF-LDA: Similarity Kernel Fusion for Predicting lncRNA-Disease Association. Molecular Therapy Nucleic Acids, 18, 45-55. [Google Scholar] [CrossRef] [PubMed]
[20] Chen, G., Wang, Z.Y., Wang, D.Q., Qiu, C.X., Liu, M.X., Chen, X., Zhang, Q.P., Yan, G.Y. and Cui, Q.H. (2013) LncRNA Disease: A Database for Long-Non-Coding RNA-Associated Diseases. Nucleic Acids Research, 41, D983-D986. [Google Scholar] [CrossRef] [PubMed]
[21] Fu, X., Cai, L., Zeng, X., et al. (2020) StackCPPred: A Stacking and Pairwise Energy Content-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency. Bioinformatics, 36, 3028-3034. [Google Scholar] [CrossRef] [PubMed]
[22] Liang, X., Li, F., Chen, J., et al. (2021) Large-Scale Comparative Review and Assessment of Computational Methods for Anti-Cancer Peptide Identification. Briefings in Bioinformatics, 22, bbaa312. [Google Scholar] [CrossRef] [PubMed]