基于小波特征提取和支持向量机的蛋白质二级结构预测
Protein Secondary Structure Prediction Based on Wavelet Feature Extraction and Support Vector Machine
DOI: 10.12677/HJBM.2019.91003, PDF,    国家自然科学基金支持
作者: 王 剑, 成金勇*:齐鲁工业大学(山东省科学院)信息学院,山东 济南
关键词: 蛋白质二级结构预测位置特异性打分矩阵伪图像小波变换支持向量机 Protein Secondary Structure Prediction PSSM Pseudo-Image Wavelet Transform Support Vector Machine
摘要: 蛋白质的结构对理解蛋白质的生物学功能意义重大,蛋白质结构的预测就能预测和理解未知蛋白质生物学功能的作用,并且蛋白质二级结构的预测是对蛋白质结构的预测起决定性作用的,在蛋白质二级结构预测的研究中,将蛋白质单个残基用位置特异性打分矩阵(PSSM)进行编码,取窗口后可以将一个蛋白质残基表示成一个2维的伪图像平面,在原位置特异性打分矩阵数据平面的基础上,用小波变换提取到伪图像平面不同分辨率水平上的低频特征和高频特征与原PSSM平面数据当作一个蛋白质残基携带的样本信息,并用支持向量机对预测进行训练模型。
Abstract: The structure of proteins is very important for understanding the biological function of proteins. The prediction of protein structure can predict and understand the function of biological functions of unknown proteins; however, the prediction of protein secondary structure plays a decisive role in the prediction of protein structure. In the study of protein secondary structure prediction, a single residue of a protein is encoded by position-specific-score-matrix (PSSM). After a data window is taken, a protein residue can be represented as a 2-dimensional pseudo-image plane, thus could further use the wavelet method to extract multi-resolution based features both on high frequency and low frequency from original pseudo-image, these extracted wavelet-based features with the PSSM matrix together can be taken as sample information carried by a protein residue, and the training model used is support vector machine.
文章引用:王剑, 成金勇. 基于小波特征提取和支持向量机的蛋白质二级结构预测[J]. 生物医学, 2019, 9(1): 17-22. https://doi.org/10.12677/HJBM.2019.91003

参考文献

[1] Petsko, G.A. and Ringe, D. (2002) Protein Structure and Function. Lorne Protein Workshop.
[2] Whittle, P.J. and Blundell, T.L. (1994) Protein Structure-Based Drug Design. Annual Review of Biophysics and Biomolecular Structure, 23, 349-375. [Google Scholar] [CrossRef] [PubMed]
[3] Schaffhausen, J. (2012) Advances in Structure-Based Drug Design. Trends in Pharmacological Sciences, 33, 223. [Google Scholar] [CrossRef] [PubMed]
[4] Baker, D. and Sali, A. (2001) Protein Structure Prediction and Structural Genomics. Science, 294, 93-96. [Google Scholar] [CrossRef] [PubMed]
[5] Dill, K.A. and Maccallum, J.L. (2012) The Protein-Folding Problem, 50 Years on. Science, 338, 1042-1046. [Google Scholar] [CrossRef] [PubMed]
[6] Whisstock, J.C. and Lesk, A.M. (2003) Prediction of Protein Func-tion from Protein Sequence and Structure. Quarterly Reviews of Biophysics, 36, 307. [Google Scholar] [CrossRef
[7] Lee, D., Redfern, O. and Orengo, C. (2007) Predicting Protein Function from Sequence and Structure. Nature Reviews Molecular Cell Biology, 8, 995-1005. [Google Scholar] [CrossRef] [PubMed]
[8] Radivojac, P., Clark, W.T., Oron, T.R., et al. (2013) A Large-Scale Evalu-ation of Computational Protein Function Prediction. Nature Methods, 10, 221. [Google Scholar] [CrossRef] [PubMed]
[9] Lin, K., Simossis, V.A., Taylor, W.R., et al. (2004) A Simple and Fast Secondary Structure Prediction Method Using Hidden Neural Networks. Bioinformatics, 21, 152-159. [Google Scholar] [CrossRef] [PubMed]
[10] Yoo, P.D., Zhou, B.B. and Zomaya, A.Y. (2008) Machine Learning Techniques for Protein Secondary Structure Prediction: An Overview and Evaluation. Current Bioinformatics, 3, 74-86. [Google Scholar] [CrossRef
[11] Faraggi, E., Zhang, T., Yang, Y., et al. (2012) SPINE X: Improving Protein Secondary Structure Prediction by Multistep Learning Coupled with Prediction of Solvent Accessible Surface Area and Backbone Torsion Angles. Journal of Computational Chemistry, 33, 259-267. [Google Scholar] [CrossRef] [PubMed]
[12] Pauling, L., Corey, R.B. and Branson, H.R. (1951) The Structure of Proteins: Two Hydrogen-Bonded Helical Configurations of the Polypeptide Chain. Proceedings of the National Academy of Sciences, 37, 205-211. [Google Scholar] [CrossRef] [PubMed]
[13] Kabsch, W. and Sander, C. (1983) Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers, 22, 2577-2637. [Google Scholar] [CrossRef] [PubMed]
[14] Jones, D.T. (1999) Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices. Journal of Molecular Biology, 292, 195-202. [Google Scholar] [CrossRef] [PubMed]
[15] Henikoff, S. and Henikoff, J.G. (1992) Amino Acid Substitution Matrices from Protein Blocks. Proceedings of the National Academy of Sciences, 89, 10915-10919. [Google Scholar] [CrossRef] [PubMed]
[16] Mallat, S.G. (1989) A Theory for Multiresolution Signal Decom-position: The Wavelet Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 674-693. [Google Scholar] [CrossRef
[17] 于开平, 邹经湘, 杨炳渊. 小波函数的性质及其应用研究[J]. 哈尔滨工业大学学报, 2000, 32(2): 36-39.
[18] 丁宣浩. 由尺度函数构造小波的一个充要条件[J]. 工程数学学报, 2007, 24(2): 273-281.
[19] 张铃. 基于核函数的SVM机与三层前向神经网络的关系[J]. 计算机学报, 2002, 25(7): 696-700.
[20] 汪廷华, 田盛丰, 黄厚宽, 等. 样本属性重要度的支持向量机方法[J]. 北京交通大学学报, 2007, 31(5): 87-90.
[21] 张召, 黄国兴, 鲍钰. 一种改进的SMO算法[J]. 计算机科学, 2003, 30(8): 128-129.