使用核SVM和分割PSSM预测凋亡蛋白亚细胞位置
Predicting Subcellular Localization of Apoptotic Proteins Using Kernel Svm and Segmentation Pssm Method
DOI: 10.12677/CSA.2021.113073, PDF,    国家自然科学基金支持
作者: 夏新男:云南大学,信息学院,云南 昆明
关键词: 凋亡蛋白PSSM分割物理化学性质核SVMApoptosis Proteins PSSM Segmentation Physicochemical Properties Kernel SVM
摘要: 凋亡蛋白与人类的一些疾病密切相关。准确的获得凋亡蛋白的亚细胞位置对理解疾病的发病机制和药物研发有至关重要的作用。目前,研究者们主要是通过蛋白质序列获取特征信息,从而对蛋白质亚细胞位置进行预测定位并获得了较好的结果。在本文中,我们首先改进了PSSM特征提取方法,对PSSM按行分块以获得凋亡蛋白序列的局部信息,我们称之为SePSSM,其次加入7种物理化学性质对氨基酸分类获取凋亡蛋白序列的全局信息。最终将得到的两种特征融合输入到使用不同核函数的SVM中进行预测定位,预测结果通过Jackknife检验得到。实验结果表明,对PSSM进行分割要优于无分隔,RBF核函数要优于其他核函数,融合特征在ZD98和ZW225数据集上获得了较好的效果,这表明我们的方法是有效的。
Abstract: Apoptosis proteins are closely related to some human diseases. Accurate identification of the sub-cellular location of apoptosis proteins is crucial for understanding the pathogenesis of diseases and drug development. At present, researchers mainly obtain feature information from protein sequences to predict the subcellular location of proteins and obtain good results. In this paper, we first improved the feature extraction method of PSSM, segmented the PSSM matrix by row to obtain the local information of the apoptotic protein sequence, which is called SePSSM. Secondly, seven physicochemical properties were added to classify amino acids to obtain the global information of apoptotic protein sequence. Finally, the obtained two features are fused and input into SVM using different kernel functions for prediction, and the prediction results were obtained by Jackknife test. The experimental results show that PSSM method with segmentation is better than that without segmentation, the RBF kernel function is better than other kernel functions, and the fusion feature has achieved better results on the ZD98 and ZW225 datasets, which shows that our method is effec-tive.
文章引用:夏新男. 使用核SVM和分割PSSM预测凋亡蛋白亚细胞位置[J]. 计算机科学与应用, 2021, 11(3): 710-719. https://doi.org/10.12677/CSA.2021.113073

参考文献

[1] Reed, J.C. and Paternostro, G. (1999) Postmitochondrial Regulation of Apoptosis during Heart Failure. Proceedings of the National Academy of Sciences of the United States of America, 96, 7614-7616. [Google Scholar] [CrossRef] [PubMed]
[2] Schulz, J.B., Weller, M. and Moskowitz, M.A. (1999) Caspases as Treatment Targets in Stroke and Neurodegenerative Diseases. Annals of Neurology, 45, 421-429. [Google Scholar] [CrossRef
[3] Kaufmann, S.H. and Hengartner, M.O. (2001) Programmed Cell Death: Alive and Well in the New Millennium. Trends in Cell Biology, 11, 526-534. [Google Scholar] [CrossRef
[4] Evan, G. and Littlewood, T. (1998) A Matter of Life and Cell Death. Science, 281, 1317-1322. [Google Scholar] [CrossRef] [PubMed]
[5] Zhou, G.P. and Doctor, K. (2003) Subcellular Location Pre-diction of Apoptosis Proteins. Proteins: Structure, Function, and Bioinformatics, 50, 44-48. [Google Scholar] [CrossRef] [PubMed]
[6] Zhou, H., Yang, Y. and Shen, H.B. (2017) Hum-mPLoc 3.0: Prediction Enhancement of Human Protein Subcellular Localization through Modeling the Hidden Correlations of Gene Ontology and Functional Domain Features. Bioinformatics, 33, 843-853. [Google Scholar] [CrossRef] [PubMed]
[7] Chen, Y.L. and Li, Q.Z. (2007) Prediction of Apoptosis Pro-tein Subcellular Location Using Improved Hybrid Approach and Pseudo-Amino Acid Composition. Journal of Theoret-ical Biology, 248, 377-381. [Google Scholar] [CrossRef] [PubMed]
[8] Jones, D.T. (1999) Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices. Journal of molecular biology, 292, 195-202. [Google Scholar] [CrossRef] [PubMed]
[9] Yu, X.Q., Zheng, X.Q., Liu, T.G., Dou, Y.C. and Wang, J. (2012) Predicting Subcellular Location of Apoptosis Proteins with Pseudo Amino Acid Composition: Approach from Amino Acid Substitution Matrix and Auto Covariance Transformation. Amino acids, 42, 1619-1625. [Google Scholar] [CrossRef] [PubMed]
[10] Li, B., Cai, L., Liao, B., Bing, P. and Yang, J. (2019) Prediction of Protein Subcellular Localization Based on Fusion of Multi-View Features. Molecules, 24, Article No. 919. [Google Scholar] [CrossRef] [PubMed]
[11] Wang, S.F., Cao, Z.C., Li, M.Y. and Yue, Y.T. (2019) G-DipC: An Improved Feature Representation Method for Short Sequences to Predict the Type of Cargo in Cell-Penetrating Pep-tides. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17, 739-747. [Google Scholar] [CrossRef
[12] Zhang, S. and Liang, Y. (2018) Predicting Apoptosis Protein Subcellular Localization by Integrating Auto-Cross Correlation and PSSM into Chou’s PseAAC. Journal of Theoretical Biology, 457, 163-169. [Google Scholar] [CrossRef] [PubMed]
[13] 刘太岗, 王春华. 基于SVM-RFE算法的凋亡蛋白亚细胞定位预测[J]. 计算机工程与应用, 2017(10): 155-159.
[14] Ding, Y.S. and Zhang, T.L. (2008) Using Chou’s Pseudo Amino Acid Composition to Predict Subcellular Localization of Apoptosis Proteins: An Approach with Immune Genetic Algorithm-Based Ensemble Classifier. Pattern Recognition Letters, 29, 1887-1892. [Google Scholar] [CrossRef
[15] Zhang, Z.H., Wang, Z.H., Zhang, Z.R. and Wang, Y.X. (2006) A Novel Method for Apoptosis Protein Subcellular Localization Prediction Combining Encoding Based on Grouped Weight and Support Vector Machine. FEBS Letters, 580, 6169-6174. [Google Scholar] [CrossRef] [PubMed]
[16] Xiang, Q., Liao, B., Li, X., Xu, H., Chen, J., Shi, Z., et al. (2017) Subcellular Localization Prediction of Apoptosis Proteins Based on Evolutionary Information and Support Vector Ma-chine. Artificial Intelligence in Medicine, 78, 41-46. [Google Scholar] [CrossRef] [PubMed]
[17] Fu, H.Y., Cao, Z.C., Li, M.Y. and Wang, S.F. (2020) ACEP: Improving Antimicrobial Peptides Recognition through Automatic Feature Fusion and Amino Acid Embedding. BMC Genomics, 21, Article No. 597. [Google Scholar] [CrossRef] [PubMed]
[18] Chou, K.C. and Shen, H.B. (2007) Recent Progress in Protein Subcellular Location Prediction. Analytical Biochemistry, 370, 1-16. [Google Scholar] [CrossRef] [PubMed]
[19] Chou, K.C. and Maggiora, G.M. (1998) Domain Structural Class Prediction. Protein Engineering, 11, 523-538. [Google Scholar] [CrossRef] [PubMed]
[20] Chou, K.C., Liu, W.M., Maggiora, G.M. and Zhang, C.T. (1998) Prediction and Classification of Domain Structural Classes. Proteins: Structure, Function, and Bioinformatics, 31, 97-103. [Google Scholar] [CrossRef
[21] Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., et al. (2003) The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL in 2003. Nucleic Acids Research, 31, 365-370. [Google Scholar] [CrossRef] [PubMed]
[22] Wang, J., Yang, B., Revote, J., Leier, A., Marquez-Lago, T.T., Webb, G., et al. (2017) POSSUM: A Bioinformatics Toolkit for Generating Numerical Sequence Feature Descriptors Based on PSSM Profiles. Bioinformatics, 33, 2756-2758. [Google Scholar] [CrossRef] [PubMed]
[23] Wei, L., Liao, M., Gao, X., Wang, J. and Lin, W. (2016) mGOF-Loc: A Novel Ensemble Learning Method for Human Protein Subcellular Lo-calization Prediction. Neurocomputing, 217, 73-82. [Google Scholar] [CrossRef
[24] Shen, H.B. and Chou, K.C. (2007) Nuc-PLoc: A New Web-Server for Predicting Protein Subnuclear Localization by Fusing PseAA Composition and PsePSSM. Protein En-gineering, Design & Selection, 20, 561-567. [Google Scholar] [CrossRef] [PubMed]
[25] Lin, C., Zou, Y., Qin, J., Liu, X., Jiang, Y., Ke, C., et al. (2013) Hi-erarchical Classification of Protein Folds Using a Novel Ensemble Classifier. PLoS ONE, 8, e56499. [Google Scholar] [CrossRef] [PubMed]
[26] Qiu, J.D., Luo, S.H., Huang, J.H., Sun, X.Y. and Liang, R.P. (2010) Predicting Subcellular Location of Apoptosis Proteins Based on Wavelet Transform and Support Vector Machine. Amino Acids, 38, 1201-1208. [Google Scholar] [CrossRef] [PubMed]
[27] Burges, C.J.C. (1998) A Tutorial on Support Vector Machines for Pattern recognition. Data Mining and Knowledge Discovery, 2, 121-167. [Google Scholar] [CrossRef
[28] Chou, K.C. (2011) Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. Journal of Theoretical Biology, 273, 236-247. [Google Scholar] [CrossRef] [PubMed]
[29] Mei, S. (2012) Multi-Kernel Transfer Learning Based on Chou’s PseAAC Formulation for Protein Submitochondria Localization. Journal of Theoretical Biology, 293, 121-130. [Google Scholar] [CrossRef] [PubMed]
[30] Zhang, L., Liao, B., Li, D. and Zhu, W. (2009) A Novel Represen-tation for Apoptosis Protein Subcellular Localization Prediction Using Support Vector Machine. Journal of Theoretical Biology, 259, 361-365. [Google Scholar] [CrossRef] [PubMed]
[31] Liang, Y., Liu, S. and Zhang, S. (2017) Geary Autocorrelation and DCCA Coefficient: Application to Predict Apoptosis Protein Subcellular Localization via PSSM. Physica A: Statistical Mechanics and Its Applications, 467, 296-306. [Google Scholar] [CrossRef
[32] Zhang, S. and Duan, X. (2018) Prediction of Protein Subcellular Localization with Oversampling Approach and Chou’s General PseAAC. Journal of Theoretical Biology, 437, 239-250. [Google Scholar] [CrossRef] [PubMed]
[33] Ruan, X., Zhou, D., Nie, R., Hou, R. and Cao, Z. (2019) Prediction of Apoptosis Protein Subcellular Location Based on Position-Specific Scoring Matrix and Isometric Mapping Algorithm. Medical & Biological Engineering & Computing, 57, 2553-2565. [Google Scholar] [CrossRef] [PubMed]