DNA结合蛋白特征提取算法综述
An Overview of DNA-Binding Protein for Feature Extraction Algorithms
DOI: 10.12677/HJCB.2020.102003, PDF,    科研立项经费支持
作者: 陈鹏丞, 高雅, 倪建威, 张艳萍*:河北工程大学数理科学与工程学院,河北 邯郸
关键词: DNA结合蛋白特征提取序列信息结构信息DNA-Binding Protein Feature Extraction Sequence Information Structure Information
摘要: DNA结合蛋白的识别与预测对于研究生物体的生命活动,理解生命活动内在机理具有十分重要的作用。随着蛋白质序列数目的快速增加,计算方法比传统实验方法具有更大的优势。本文从蛋白质的序列信息和结构信息入手,对目前DNA结合蛋白特征提取方法进行归纳总结。在PDB1075和PDB186数据集上,利用XGBoost算法对9种蛋白质序列特征提取方法进行对比分析。结果显示,不同的特征提取方法具有各自的优势与不足,其中,基于蛋白质序列进化信息的Local_DPP方法综合表现最好。
Abstract: The recognition and prediction for DNA-binding proteins play a very important role in studying and understanding the internal mechanisms life activities. The huge numbers of protein sequences have been produced. Computational method has greater advantages than traditional experimental methods. In this paper, we summary the existed methods of DNA-binding protein for feature ex-traction based on the sequence information and structural information of the protein. The XGBoost algorithm is employed to compare and analyze the nine feature extraction methods of protein se-quence on the PDB1075 and PDB186 datasets. The results demonstrate that different feature ex-traction methods have their own advantages and disadvantages. Among them, the Local_DPP method based on the evolution information of protein sequences has the best comprehensive pre-diction performance.
文章引用:陈鹏丞, 高雅, 倪建威, 张艳萍. DNA结合蛋白特征提取算法综述[J]. 计算生物学, 2020, 10(2): 21-30. https://doi.org/10.12677/HJCB.2020.102003

参考文献

[1] Kumar, M., Gromiha, M.M. and Raghava, G.P.S. (2007) Identification of DNA-Binding Proteins Using Support Vector Machines and Evolutionary Profiles. BMC Bioinformatics, 8, Article No. 463. [Google Scholar] [CrossRef] [PubMed]
[2] 汤希玮. 蛋白质复合物识别算法综述[J]. 长沙大学学报, 2017, 31(5): 19-23.
[3] 张军. 基于序列信息的DNA/RNA结合蛋白识别[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2018.
[4] Kurgan, L.A., Cios, K.J. and Chen, K. (2008) SCPRED: Accurate Prediction of Protein Structural Class for Sequences of Twilight-Zone Similarity with Predicting Sequences. BMC Bioinformatics, 9, Article No. 226. [Google Scholar] [CrossRef] [PubMed]
[5] Yang, J.-Y., Peng, Z.-L. and Chen, X. (2010) Prediction of Protein Structural Classes for Low-Homology Sequences Based on Predicted Secondary Structure. BMC Bioinformatics, 11, Ar-ticle No. S9. [Google Scholar] [CrossRef
[6] Dai, Q., Li, Y., Liu, X., Yao, Y., Cao, Y. and He, P. (2013) Comparison Study on Statistical Features of Predicted Secondary Structures for Protein Structural Class Prediction: From Content to Position. BMC Bioinformatics, 14, Article No. 152. [Google Scholar] [CrossRef] [PubMed]
[7] Szilágyi, A. and Skolnick, J. (2006) Efficient Prediction of Nucleic Acid Binding Function from Low-Resolution Protein Structures. Journal of Molecular Biology, 358, 922-933. [Google Scholar] [CrossRef] [PubMed]
[8] Stawiski, E.W., Gregoret, L.M. and Mandel-Gutfreund, Y. (2003) Annotating Nucleic Acid-Binding Function Based on Protein Structure. Journal of Molecular Biology, 326, 1065-1079. [Google Scholar] [CrossRef
[9] Ahmad, S. and Sarai, A. (2004) Moment-Based Prediction of DNA-Binding Proteins. Journal of Molecular Biology, 341, 65-71. [Google Scholar] [CrossRef] [PubMed]
[10] Shanahan, H.P., Garcia, M.A., Jones, S. and Thornton, J.M. (2004) Identifying DNA-Binding Proteins Using Structural Motifs and the Electrostatic Potential. Nucleic Acids Research, 32, 4732-4741. [Google Scholar] [CrossRef] [PubMed]
[11] Gao, M. and Skolnick, J. (2008) DBD-Hunter: A Knowledge-Based Method for the Prediction of DNA-Protein Interactions. Nucleic Acids Research, 36, 3978-3992. [Google Scholar] [CrossRef] [PubMed]
[12] Gao, M. and Skolnick, J. (2009) A Threading-Based Method for the Pre-diction of DNA-Binding Proteins with Application to the Human Genome. PLoS Computational Biology, 5, e1000567. [Google Scholar] [CrossRef] [PubMed]
[13] Zhao, H., Yang, Y. and Zhou, Y. (2010) Structure-Based Pre-diction of DNA-Binding Proteins by Structural Alignment and a Volume-Fraction Corrected DFIRE-Based Energy Function. Bioinformatics, 26, 1857-1863. [Google Scholar] [CrossRef] [PubMed]
[14] Zhang, Y., Xu, J., Zheng, W., Zhang, C., Qiu, X., Chen, K. and Ruan, J. (2014) newDNA-Prot: Prediction of DNA-Binding Proteins by Employing Support Vector Machine and a Comprehensive Sequence Representation. Computational Biology and Chemistry, 52, 51-59. [Google Scholar] [CrossRef] [PubMed]
[15] Chou, K.-C. (2001) Prediction of Protein Cellular Attrib-utes Using Pseudo-Amino Acid Composition. Proteins: Structure, Function, and Bioinformatics, 43, 246-255. [Google Scholar] [CrossRef] [PubMed]
[16] Zhang, P., et al. (2016) A Protein Network Descriptor Server and Its Use in Studying Protein, Disease, Metabolic and Drug Targeted Networks. Briefings in Bioinformatics, 18, 1057-1070.
[17] Feng, Z.-P. and Zhang, C.-T. (2000) Prediction of Membrane Protein Types Based on the Hydropho-bic Index of Amino Acids. Journal of Protein Chemistry, 19, 269-275. [Google Scholar] [CrossRef
[18] Wang, Y., Ding, Y., Guo, F., Wei, L. and Tang, J. (2017) Im-proved Detection of DNA-Binding Proteins via Compression Technology on PSSM Information. PLoS ONE, 12, e0185587. [Google Scholar] [CrossRef] [PubMed]
[19] Chou, K.-C. and Shen, H.-B. (2007) MemType-2L: A Web Server for Predicting Membrane Proteins and Their Types by Incorporating Evolution Information through Pse-PSSM. Biochemical and Biophysical Research Communications, 360, 339-345. [Google Scholar] [CrossRef] [PubMed]
[20] Wei, L., Tang, J. and Zou, Q. (2017) Local-DPP: An Improved DNA-Binding Protein Prediction Method by Exploring Local Evolutionary Information. Information Sciences, 384, 135-144. [Google Scholar] [CrossRef
[21] Wang, C., Fang, Y., Xiao, J. and Li, M. (2011) Identifica-tion of RNA-Binding Sites in Proteins by Integrating Various Sequence Information. Amino Acids, 40, 239-248. [Google Scholar] [CrossRef] [PubMed]
[22] Liu, B., Xu, J., Lan, X., Xu, R., Zhou, J., Wang, X. and Chou, K.-C. (2014) iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Re-duced Alphabet Profile into the General Pseudo Amino Acid Composition. PLoS ONE, 9, e106691. [Google Scholar] [CrossRef] [PubMed]
[23] Lou, W., Wang, X., Chen, F., Chen, Y., Jiang, B. and Zhang, H. (2014) Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes. PLoS ONE, 9, e86703. [Google Scholar] [CrossRef] [PubMed]
[24] Zou, Y., Ding, Y., Tang, J., Guo, F. and Peng, L. (2019) FKRR-MVSF: A Fuzzy Kernel Ridge Regression Model for Identifying DNA-Binding Proteins by Multi-View Se-quence Features via Chou’s Five-Step Rule. International Journal of Molecular Sciences, 20, 4175. [Google Scholar] [CrossRef] [PubMed]