基于融合特征和Voting集成学习的膜蛋白类型预测
Prediction of Membrane Protein Types Based on Fusion Feature Information and Voting Ensemble Learning
摘要: 膜蛋白是细胞功能的主要承担者,其功能与其类型密切相关。膜蛋白类型的鉴定是生物信息学中的一项重要课题。已有的膜蛋白分类模型主要从膜蛋白序列信息中提取特征,本文提出了一种基于蛋白质二级结构信息的蛋白质特征提取方法,并将其融入现有的两种序列特征。通过对比实验结果显示,在融入了蛋白质二级结构特征后,几种不同机器学习分类算法下的膜蛋白预测精度均有提升,说明了该融合蛋白质二级结构特征方法的有效性。最后,基于Voting集成学习框架,结合三种机器学习算法构建膜蛋白分类模型。结果表明,该模型的预测效果优于现有的几种机器学习模型。
Abstract: Studies have shown that membrane proteins are the main bearers of cellular functions and their functions are closely related to their types. Therefore, the identification of membrane protein types is an important topic in bioinformatics. The existing classification models for membrane proteins mainly extract features from the sequence information of membrane proteins. In this paper, a protein feature extraction method was proposed based on protein secondary structure information, which was integrated into two existing sequence features. By comparing the experimental results, the prediction accuracy of membrane proteins under several different machine learning classification algorithms was improved after integrating protein secondary structure features, which illustrated the effectiveness of this fusion protein secondary structure feature method. Finally, a membrane protein classification model was constructed based on the voting ensemble learning frame-work in combination with three machine learning algorithms. The results show that the prediction performance of this model is better than other machine learning models.
文章引用:苏鹏程. 基于融合特征和Voting集成学习的膜蛋白类型预测[J]. 计算生物学, 2021, 11(4): 49-58. https://doi.org/10.12677/HJCB.2021.114006

参考文献

[1] Almén, M.S., Nordström, K.J., Fredriksson, R. and Schiöth, H.B. (2009) Mapping the Human Membrane Proteome: A Majority of the Human Membrane Proteins Can Be Classified According to Function and Evolutionary Origin. BMC Bi-ology, 7, Article No. 50. [Google Scholar] [CrossRef] [PubMed]
[2] Overington, J.P., Al-Lazikani, B. and Hop-kins, A.L. (2006) How Many Drug Targets Are There? Nature Reviews Drug Discovery, 5, 993-996. [Google Scholar] [CrossRef] [PubMed]
[3] Chou, K.C. and Shen, H.B. (2007) MemType-2L: A Web Server for Predict-ing Membrane Proteins and Their Types by Incorporating Evolution Information through Pse-PSSM. Biochemical and Biophysical Research Communications, 360, 339-345. [Google Scholar] [CrossRef] [PubMed]
[4] Chou, K.C. and Elrod, D.W. (1999) Prediction of Membrane Protein Types and Subcellular Locations. Proteins: Structure Function and Bioinformatics, 34, 137-153. [Google Scholar] [CrossRef
[5] Chou, K.C. (2001) Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition. Proteins: Structure Function and Bio-informatics, 43, 246-255. [Google Scholar] [CrossRef] [PubMed]
[6] Hayat, M., Khan, A. and Yeasin, M. (2012) Pre-diction of Membrane Proteins Using Split Amino Acid and Ensemble Classification. Amino Acids, 42, 2447-2460. [Google Scholar] [CrossRef] [PubMed]
[7] Petrilli, P. (1993) Classification of Protein Sequences by Their Dipeptide Composition. Bioinformatics, 9, 205-209. [Google Scholar] [CrossRef] [PubMed]
[8] Alphonse, A.S., Mary, N.A.B. and Starvin, M.S. (2020) Clas-sification of Membrane Protein Using Tetra Peptide Pattern. Analytical Biochemistry, 606, Article ID: 113845. [Google Scholar] [CrossRef] [PubMed]
[9] Hayat, M. and Khan, A. (2012) Mem-PHybrid: Hybrid Fea-tures-Based Prediction System for Classifying Membrane Protein Types. Analytical Biochemistry, 424, 35-44. [Google Scholar] [CrossRef] [PubMed]
[10] Wang, H., Ding, Y.J., Tang, J.J. and Guo, F. (2020) Identification of Membrane Protein Types via Multivariate Information Fusion with Hilbert-Schmidt Independence Criterion. Neurocom-puting, 83, 257-269. [Google Scholar] [CrossRef
[11] Wang, L.P., Yuan, Z.T., Chen, X.H. and Zhou, Z.F. (2010) The Prediction of Membrane Protein Types with NPE. IEICE Electronics Express, 7, 397-402. [Google Scholar] [CrossRef
[12] Hayat, M. and Khan, A. (2010) Predicting Membrane Protein Types by Fusing Composite Protein Sequence Features into Pseudo Amino Acid Composition. Journal of Theoretical Biology, 271, 10-17. [Google Scholar] [CrossRef] [PubMed]
[13] 郭磊, 王顺芳. 序列信息融合与两阶段特征选择的膜蛋白预测[J]. 计算机工程与应用, 2019, 55(6): 145-150.
[14] Myers, J.K. and Oas, T.G. (2001) Preorganized Secondary Structure as an Important Determinant of Fast Protein Folding. Nature Structural Biology, 8, 552-558. [Google Scholar] [CrossRef] [PubMed]
[15] Wan, S.B., Mak, M.-W. and Kung, S.-Y. (2016) Benchmark Data for Identify-ing Multifunctional Types of Membrane Proteins. Data in Brief, 8, 105-107. [Google Scholar] [CrossRef] [PubMed]
[16] Cuff, J.A. and Barton, G.J. (1999) Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction. Proteins: Structure Function and Bioinformatics, 34, 508-519. [Google Scholar] [CrossRef
[17] Wang, S., Li, W., Liu, S.W. and Xu, J. (2014) RaptorX-Property: A Web Server for Protein Structure Property Prediction. Nucleic Acids Research, 44, W430-W435. [Google Scholar] [CrossRef] [PubMed]
[18] Zhang, X.L. and Chen, L. (2020) Prediction of Membrane Protein Types by Fusing Protein-Protein Interaction and Protein Sequence Information. BBA-Proteins and Proteomics, 1868, Article ID: 140524. [Google Scholar] [CrossRef] [PubMed]
[19] Huang, G.H., Zhang, Y.C., Chen, L., Zhang, N., Huang, T. and Cai, Y.-D. (2014) Prediction of Multi-Type Membrane Proteins in Human by an Integrated Approach. PLOS ONE, 9, e93553. [Google Scholar] [CrossRef] [PubMed]
[20] Nanni, L., Brahnam, S. and Lumini, A. (2012) Wavelet Images and Chou’s Pseudo Amino Acid Composition for Protein Classification. Amino Acids, 43, 657-665. [Google Scholar] [CrossRef] [PubMed]
[21] Chen, Y.K. and Li, K.B. (2013) Predicting Membrane Protein Types by Incorporating Protein Topology, Domains, Signal Peptides, and Physicochemical Properties into the General form of Chou’s Pseudo Amino Acid Composition. Journal of Theoretical Biology, 318, 1-12. [Google Scholar] [CrossRef] [PubMed]