基于支持向量机的整体分类器算法预测酶蛋白质中四类简单超二级结构
Prediction of Four Kinds of Supersecondary Structures in Enzymes by Using Ensemble Classifier Based on SVM
DOI: 10.12677/HJCB.2014.41001, PDF, HTML,  被引量 下载: 2,962  浏览: 13,572  国家自然科学基金支持
作者: 高苏娟, 胡秀珍:内蒙古工业大学理学院,呼和浩特
关键词: 酶蛋白质超二级结构矩阵打分支持向量机整体分类器Enzyme; Supersecondary Structure; Scoring Function; Support Vector Machine; Ensemble Classifier
摘要: 酶是一种具有催化功能的蛋白质,研究酶蛋白质中的超二级结构对研究酶的结构及功能有重要作用。本文从酶蛋白质序列出发,首次对酶蛋白质中的四类简单超二级结构进行研究。以位点氨基酸及其紧邻关联为参数,选取五种序列片段截取方式,采用7-交叉检验,使用矩阵打分方法预测的结果不理想;将矩阵打分值作为特征参数输入支持向量机,并用整体分类器进行加权融合,得到了较好的预测结果,预测总精度达到72.64%Matthew’s相关系数在0.57以上,因此,基于支持向量机的整体分类器方法是一种有效的预测酶蛋白质中超二级结构的方法。
Abstract: Enzymes are a kind of protein that has catalytic function. The study of supersecondary structures in enzymes plays an important role in the structure and function of enzymes. Based on enzyme sequence information, four kinds of supersecondary structures in enzymes were researched for the first time. Amino acids of sites and dipeptide components of sites were selected as parameters, for five selections of the best fixed-length pattern, the predictive results in 7-fold cross-validation were not ideal by using scoring function method; scores were selected as input parameters of support vector machine (SVM); the results were fused with weighted factors by using ensemble classifier; the better performance was obtained; the overall prediction accuracy was 72.64% and the Matthews correlation coefficient was above 0.57. Therefore, ensemble classifier based on SVM is an effective method to predict four kinds of supersecondary structures in enzymes.
文章引用:高苏娟, 胡秀珍. 基于支持向量机的整体分类器算法 预测酶蛋白质中四类简单超二级结构 [J]. 计算生物学, 2014, 4(1): 1-11. http://dx.doi.org/10.12677/HJCB.2014.41001

参考文献

[1] Cai, Y.D. and Chou, K.C. (2005) Using Functional Domain Composition To Predict Enzyme Family Classes. Journal of Proteome Research, 4, 109-111.
[2] Cai, Y.D., Guo, P.Z. and Chou, K.C. (2005) Predicting Enzyme Family Classes by Hybridizing Gene Product Composition and Pseudo-Amino Acid Composition. Journal of Theoretical Biology, 234, 145-149.
[3] Chou, K.C. and Cai, Y.D. (2004) Using GO-PseAA Predictor to Predict Enzyme Sub-Class. Biochemical and Biophysical Research Communications, 325, 506-507.
[4] Shen, H.B. and Chou, K.C. (2007) EzyPred: A Top-Down Approach for Predicting Enzyme Functional Classes and Subclasses. Biochemical and Biophysical Research Communications, 364, 53-59.
[5] Shi, R.J. and Hu, X.Z. (2010) Predicting Enzyme Subclasses by Using Support Vector Machine with Composite Vectors. Protein and Peptide Letters, 17, 599-604.
[6] Hu, X.Z. and Ting, W. (2011) Prediction of Enzyme Subclass by Using Support Vector Machine Based on Improved Parameters. 2011 7th International Conference on Natural Computation, Shanghai, 26-28 July 2011, 593-598.
[7] Wang, Y. and Hu, X.Z. (2011) Predicting of Oxidoreductase and Lyase Subclasses by Using Support Vector Machine. 2011 10th IEEE/ACIS International Conference on Computer and Information Science, Sanya, 16-18 May 2011, 2731.
[8] Liu, X.X. and Hu, X.Z. (2011) Identifying the β-Hairpin Motifs in Enzymes by Using Support Vector Machine. 2011 10th IEEE/ACIS International Conference on Computer and Information Science, Sanya, 16-18 May 2011, 21-26.
[9] Long, H.X. and Hu, X.Z. (2012) Prediction β-Hairpin Motifs in Enzyme Protein Using Three Methods. 2012 8th International Conference on Natural Computation (ICNC 2012), Chongqing, 29-31 May 2012, 570-574.
[10] 阎隆飞, 孙之荣 (1999) 蛋白质分子结构.清华大学出版社, 北京, 43-56.
[11] Kuhn, M., Meiler, J. and Baker, D. (2004) Strand-Loop-Strand Motifs: Prediction of Hairpin and Diverging Turns in Proteins. Protein, 5, 282-288.
[12] Cruz, X., Hutchinson, E.G., Shepherd, A., et al. (2002) Predicting Protein Topology: An Approach to Identifying Bhairpins. Proceedings of the National Academy of Sciences, 99, ll157-1l162.
[13] Kumar, M., Bhasin, M., Natt, N.K., et al. (2005) BhairPred: Prediction of β-Hairpins in a Protein from Multiple Alignment Information Using ANN and SVM Techniques. Nucleic Acids Research, 33, 154-159.
[14] 胡秀珍, 李前忠 (2006) 用离散量的方法识别蛋白质的超二级结构. 生物物理学报, 6, 424-428.
[15] Zou, D.S., He, Z.S., He, J.Y., et al. (2011) Supersecondary Structure Prediction Using Chou’s Pseudo Amino Acid Composition. Journal of Computational Chemistry, 32, 271-278.
[16] Hu, X.Z. and Li, Q.Z. (2008) Prediction of the β-Hairpins in Proteins Using Support Vector Machine. The Protein Journal, 27, 115-122.
[17] Hu, X.Z., Li, Q.Z. and Wang, C.L. (2010) Recognition of β-Hairpin Motifs in Proteins by Using the Composite Vector. Amino Acids, 38, 915-921.
[18] Sun, L.X., Hu, X.Z. and Li, S.B. (2012) Predicting βαβ Motifs Based on SVM by Using the ID and MS Values. 2012 5th International Conference on BioMedical Engineering and Informatics (BMEI 2012), Chongqing, 16-18 October 2012, 910-914.
[19] Wang, Z., Harkins, P.C., Ulevitch, R.J., Han, J.H., Cobb, M.H. and Goldsmith, E.J. (1997) The Structure of MitogenActivated Protein Kinase p38 at 2.1-Å Resolution. Proceedings of the National Academy of Sciences, 94, 2327-2332.
[20] Batistic, O. and Kudla, J. (2004) Integration and Channeling of Calcium Signaling through the CBL Calcium Sensor/ CIPK Protein Kinase Network. Planta, 219, 915-924.
[21] Webb, E.C. (1992) Enzyme Nomenclature. Academic Press, SanDiego.
[22] Cartharius, K., Frech, K., Grote, K., et al. (2005) Mat Inspector and Beyond: Promoter Analysis Based on Transcription Factor Binding Sites. Bioinformatics, 21, 2933-2942.
[23] Kel, A.E., GoBling, E., Reuter, I., et al. (2003) MATCHTM: A Tool for Searching Transcription Factor Binding Sites in DNA Sequences. Nucleic Acids Research, 31, 3576-3579.
[24] Vapnik, V. (1995) The Nature of Statistical Learning Theory. Springer, New York.
[25] Vapnik, V. (1998) Statistical Learning Theory. Wiley-Interscience, Hoboken
[26] Hu, X.Z. and Li, Q.Z. (2008) Using Support Vector Machine to Predict β-Turns and γ-Turns in Proteins. Computational Chemistry, 29, 1867-1875.
[27] Chou, K.C. and Cai, Y.D. (2002) Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location. Journal of Biological Chemistry, 227, 45765-45769.
[28] Ding, C.H.Q. and Dubchak, I. (2001) Multi-Class Protein Fold Recognition Using Support Vector Machines and Neural Networks. Bioinformatics, 17, 349-358.
[29] Shi, J.Y., Pan, Z., Zhang, S.W. and Liang, Y. (2006) Protein Fold Recognition with Support Vector Machines Fusion Network. Progress in Biochemistry Biophysics, 3, 155-162.
[30] Chang, C.C. and Lin, C.J. (2001) LIBSVM: A Library for Support Vector Machines. Software.
http://www.Csie.ntu.edu.tw/cjlin/libsvm
[31] Shen, H.B. and Chou, K.C. (2006) Ensemble Classifier for Protein Fold Pattern Recognition. Bioinformatics, 22, 17171722.