用多样性增量特征选择技术识别蛋白质磷酸化位点
Identification of Protein Phosphorylation Sites by Diversity Increment Feature Selection Technique
摘要: 磷酸化是最重要的蛋白质翻译后修饰之一,在许多细胞过程中扮演重要角色。发展磷酸化位点精确识别的计算生物学方法,有助于对磷酸化信号转导机制的理解。本文给出一种激酶无关的磷酸化位点识别模型,称为FSID_PhSite。模型以k间隔氨基酸对组分和位置保守氨基酸组分为特征,应用多样性增量特征选择技术进行特征筛选,将选出的特征输入到支持向量机算法进行识别。在正负样本数之比为1:1的情形下,对磷酸化丝氨酸、苏氨酸和酪氨酸在独立测试集检验,识别精度分别达到84.34%、82.32%和68.89%。结果优于现有的激酶无关磷酸化位点识别模型。
Abstract: Phosphorylation is one of the most important protein post-translational modifications and plays important roles in numerous biological processes by significantly affecting proteins’ structure and dynamics. The development of computational biological methods for the accurate identification of phosphorylation sites helps to our understanding of key signal transduction mechanisms. In this paper, a kinase independent phosphorylation site identification model was presented, called FSID_PhSite. The model is featured by component of k-spaced amino acid pairs and the position conservation of residues surrounding the phosphorylation sites. Applying diversity incremental feature selection technique to feature selection and inputting the selected features into the support vector machine algorithm for recognition, when the ratio of positive and negative samples is 1:1, on independent testing dataset validation, the accuracy of identification for serine, threonine and tyrosine sites is 84.34%, 82.32% and 68.89%, respectively. The results were superior to the existing kinase independent phosphorylation sites identification model.
文章引用:胡世赛, 梁珍, 陈宇翔, 张颖, 吕军. 用多样性增量特征选择技术识别蛋白质磷酸化位点[J]. 计算生物学, 2018, 8(1): 24-32. https://doi.org/10.12677/HJCB.2018.81004

参考文献

[1] Pinna, L.A. and Ruzzene, M. (1996) How do Protein Kinases Recognize Their Substrates? Biochimica et Biophysica Acta, 1314, 191-225. [Google Scholar] [CrossRef
[2] Wong, Y.H., Lee, T.Y., Liang, H.K., Huang, C.M., Wang, T.Y., Yang, Y.H., Chu, C.H., Huang, H.D., Ko, M.T. and Hwang, J.K. (2007) KinasePhos 2.0: A Web Server for Identifying Protein Kinase Specific Phosphorylation Sites Based on Sequences and Coupling Patterns. Nucleic Acids Research, 35, W588-W594. [Google Scholar] [CrossRef] [PubMed]
[3] 张颖, 罗辽复, 吕军. 使用多样性增量预测磷酸化位点[J]. 内蒙古大学学报, 2008, 39(1): 34-39.
[4] Xue, Y., Ren, J., Gao, X., Jin, C., Wen, L. and Yao, X. (2008) GPS 2.0, a Tool to Predict Kinase-Specific Phosphorylation Sites in Hierarchy. Molecular and Cellular Proteomics, 7, 1598-1608. [Google Scholar] [CrossRef
[5] 白海燕, 吕军, 张颖, 等. 蛋白质磷酸化位点的识别[J]. 内蒙古工业大学学报, 2011, 30(2): 108-115.
[6] Trost, B., Kusalik, A. and Napper, S. (2016) Computational Analysis of the Predicted Evolutionary Conservation of Human Phosphorylation Sites. PLoS One, 11, e0152809. [Google Scholar] [CrossRef] [PubMed]
[7] Karabulut, N.P. and Frishman, D. (2016) Sequence- and Structure-Based Analysis of Tissue-Specific Phosphorylation Sites. PLoS One, 11, e0157896. [Google Scholar] [CrossRef] [PubMed]
[8] Zhao, Y.W., Lai, H.Y., Tang, H., Chen, W. and Lin, H. (2016) Prediction of Phosphothreonine Sites in Human Proteins by Fusing Different Features. Scientific Reports, 6, 34817. [Google Scholar] [CrossRef] [PubMed]
[9] Blom, N., Gammetltoft, S. and Brunak, S. (1999) Sequence and Structure-Based Predic-tion of Eukaryotic Protein Phosphorylation Sites. Journal of Molecular Biology, 294, 1351-1362. [Google Scholar] [CrossRef] [PubMed]
[10] Lakoucheva, L., Radivojac, P., Brown, C., et al. (2004) The Importance of Intrinsic Disorder for Protein Phosphorylation. Nucleic Acids Research, 32, 1037. [Google Scholar] [CrossRef] [PubMed]
[11] Biswas, A.K., Noman, N. and Sikder, A.R. (2010) Machine Learning Approach to Predict Protein Phosphorylation Sites by Incorporating Evolutionary Information. BMC Bioinformatics, 11, 273. [Google Scholar] [CrossRef] [PubMed]
[12] Zhao, X., Zhang, W., Xu, X., Ma, Z. and Yin, M. (2012) Prediction of Protein Phosphorylation Sites by Using the Composition of k-Spaced Amino Acid Pairs. PLoS One, 7, e46302. [Google Scholar] [CrossRef] [PubMed]
[13] Chaudhuri, R. and Yang, J.Y. (2017) Cross-Species PTM Mapping from Phosphoproteomic Data. Methods in Molecular Biology, 1558, 459-469. [Google Scholar] [CrossRef] [PubMed]
[14] Audagnotto, M. and Dal Peraro, M. (2017) Protein Post-Translational Modifications: In Silico Prediction Tools and Molecular Modeling. Computational and Structural Biotechnology Journal, 15, 307-319. [Google Scholar] [CrossRef] [PubMed]
[15] Drotár, P., Gazda, J. and Smékal, Z. (2015) An Experimental Comparison of Feature Selection Methods on Two-Class Biomedical Datasets. Computers in Biology and Medicine, 66, 1-10. [Google Scholar] [CrossRef] [PubMed]
[16] Yu, L. and Liu, H. (2003) Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Fawcett, T. and Mishra, N., Eds., Proceedings of the Twentieth International Conference on International Conference on Machine Learning, The AAAI Press, Palo Alto, 856-863.
[17] Peng, H., Long, F. and Ding, C. (2005) Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226-1238. [Google Scholar] [CrossRef
[18] Zou, Q., Zeng, J., Cao, L. and Ji, R. (2016) A Novel Features Ranking Metric with Application to Scalable Visual and Bioinformatics Data Classification. Neurocomputing, 173, 346-354. [Google Scholar] [CrossRef
[19] Yang, S.Q., Hu, S.S., Zhang, Y. and Lv, J. (2017) Application of Feature Selection Technology Based on Incremental of Diversity in Prediction of Flexible regions from Protein Sequences. Letters in Organic Chemistry, 14, 621-624. [Google Scholar] [CrossRef
[20] Diella, F., Cameron, S., Gemünd, C., Linding, R., Via, A., Kuster, B., Sicheritz-Pontén, T., Blom, N. and Gibson, T.J. (2004) Phospho.ELM: A Database of Experimentally Verified Phosphorylation Sites in Eukaryotic Proteins. BMC Bioinformatics, 5, 79. [Google Scholar] [CrossRef] [PubMed]
[21] Chang, C.C. and Lin, C.J. (2011) LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2, 1-27. [Google Scholar] [CrossRef