基于集成模型的蛋白变构位点预测方法
An Ensemble Model for Protein Allosteric Site Prediction
DOI: 10.12677/biphy.2024.122004, PDF,    国家自然科学基金支持
作者: 乔仕杰, 胡芳睿, 李春华*:北京工业大学化学与生命科学学院,北京
关键词: 蛋白质变构理化性质二级结构集成模型Protein Allostery Physicochemical Properties Secondary Structure Ensemble Model
摘要: 变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是偶然所得,因此亟待发展有效的理论方法来预测蛋白质变构位点。本工作提出了一种集成的机器学习方法AllosEC用于预测蛋白质变构口袋,该方法除了考虑口袋的理化性质外,还加入了口袋的二级结构信息、深度指数(DPX)和突出指数(CX)特征。另外,为了克服正负样本极度不平衡的问题,本工作使用欠采样方法来平衡训练数据集。在独立测试集上,AllosEC在多个评价指标上优于现有的其他方法,SEN、SPE、PRE和MCC分别为0.708、0.915、0.405和0.486。这样,本工作提供了性能良好的蛋白质变构位点预测方法AllosEC。
Abstract: Allostery is an important mechanism for regulating protein functions, which is essential for many biological processes. Compared with orthosteric regulators, allosteric regulators have higher specificity and lower toxicities, which makes allosteric drug design have more advantages than orthosteric drug design. The discovery of allosteric sites is a prerequisite for allosteric drug design. Currently, experimentally obtained allosteric sites are mostly obtained by chance, and therefore there is an urgent need to develop effective theoretical methods to predict protein allosteric sites. Here, we present an ensemble machine learning method AllosEC for protein allosteric pocket prediction, where besides the pockets’ physicochemical properties, their secondary structure information, depth indexes (DPXes) and protrusion indexes (CXes) are considered. In order to overcome the problem of extreme imbalance between positive and negative samples, this work uses an under sampling method to balance the training dataset. AllosEC outperforms other existing methods in multiple evaluation metrics on the independent test set, with SEN, SPE, PRE and MCC of 0.708, 0.915, 0.405 and 0.486, respectively. Thus, this work provides a good method AllosEC for protein allosteric site prediction.
文章引用:乔仕杰, 胡芳睿, 李春华. 基于集成模型的蛋白变构位点预测方法[J]. 生物物理学, 2024, 12(2): 31-37. https://doi.org/10.12677/biphy.2024.122004

参考文献

[1] Greener, J.G. and Sternberg, M.J. (2018) Structure-Based Prediction of Protein Allostery. Current Opinion in Structural Biology, 50, 1-8. [Google Scholar] [CrossRef] [PubMed]
[2] Liu, J. and Nussinov, R. (2016) Allostery: An Overview of Its History, Concepts, Methods, and Applications. PLOS Computational Biology, 12, e1004966. [Google Scholar] [CrossRef] [PubMed]
[3] Zha, J., Li, M., Kong, R., et al. (2022) Explaining and Predicting Allostery with Allosteric Database and Modern Analytical Techniques. Journal of Molecular Biology, 434, Article ID: 167481. [Google Scholar] [CrossRef] [PubMed]
[4] Lu, S., He, X., Ni, D., et al. (2019) Allosteric Modulator Discovery: From Serendipity to Structure-Based Design. Journal of Medicinal Chemistry, 62, 6405-6421. [Google Scholar] [CrossRef] [PubMed]
[5] Guarnera, E. and Berezovsky, I.N. (2016) Allosteric Sites: Remote Control in Regulation of Protein Activity. Current Opinion in Structural Biology, 37, 1-8. [Google Scholar] [CrossRef] [PubMed]
[6] Cheng, X. and Jiang, H. (2019) Allostery in Drug Development. In: Zhang, J. and Nussinov, R., Eds., Protein Allostery in Drug Discovery, Advances in Experimental Medicine and Biology, Vol. 1163, Springer, Berlin, 1-23. [Google Scholar] [CrossRef] [PubMed]
[7] Jiang, Y. and Kalodimos, C.G. (2017) NMR Studies of Large Proteins. Journal of Molecular Biology, 429, 2667-2676. [Google Scholar] [CrossRef] [PubMed]
[8] Xiao, S., Verkhivker, G.M. and Tao, P. (2022) Machine Learning and Protein Allostery. Trends in Biochemical Sciences, 48, 375-390. [Google Scholar] [CrossRef] [PubMed]
[9] Gulati, S., Palczewski, K., Engel, A., et al. (2019) Cryo-EM Structure of Phosphodiesterase 6 Reveals Insights into the Allosteric Regulation of Type I Phosphodiesterases. Science Advances, 5, v4322. [Google Scholar] [CrossRef] [PubMed]
[10] Qi, Y., Wang, Q., Tang, B., et al. (2012) Identifying Allosteric Binding Sites in Proteins with a Two-State Go Model for Novel Allosteric Effector Discovery. Journal of Chemical Theory and Computation, 8, 2962-2971. [Google Scholar] [CrossRef] [PubMed]
[11] Weinkam, P., Pons, J. and Sali, A. (2012) Structure-Based Model of Allostery Predicts Coupling between Distant Sites. Proceedings of the National Academy of Sciences of the United States of America, 109, 4875-4880. [Google Scholar] [CrossRef] [PubMed]
[12] Goncearenco, A., Mitternacht, S., Yong, T., et al. (2013) Spacer: Server for Predicting Allosteric Communication and Effects of Regulation. Nucleic Acids Research, 41, W266-W272. [Google Scholar] [CrossRef] [PubMed]
[13] Ma, X., Meng, H. and Lai, L. (2016) Motions of Allosteric and Orthosteric Ligand-Binding Sites in Proteins Are Highly Correlated. Journal of Chemical Information and Modeling, 56, 1725-1733. [Google Scholar] [CrossRef] [PubMed]
[14] Suel, G.M., Lockless, S.W., Wall, M.A., et al. (2003) Evolutionarily Conserved Networks of Residues Mediate Allosteric Communication in Proteins. Nature Structural Biology, 10, 59-69. [Google Scholar] [CrossRef] [PubMed]
[15] Wang, J., Jain, A., Mcdonald, L.R., et al. (2020) Mapping Allosteric Communications within Individual Proteins. Nature Communications, 11, Article No. 3862. [Google Scholar] [CrossRef] [PubMed]
[16] Huang, W., Lu, S., Huang, Z., et al. (2013) Allosite: A Method for Predicting Allosteric Sites. Bioinformatics, 29, 2357-2359. [Google Scholar] [CrossRef] [PubMed]
[17] Le Guilloux, V., Schmidtke, P. and Tuffery, P. (2009) Fpocket: An Open Source Platform for Ligand Pocket Detection. BMC Bioinformatics, 10, Article No. 168. [Google Scholar] [CrossRef] [PubMed]
[18] Panjkovich, A. and Daura, X. (2014) Pars: A Web Server for the Prediction of Protein Allosteric and Regulatory Sites. Bioinformatics, 30, 1314-1315. [Google Scholar] [CrossRef] [PubMed]
[19] Song, K., Liu, X., Huang, W., et al. (2017) Improved Method for the Identification and Validation of Allosteric Sites. Journal of Chemical Information and Modeling, 57, 2358-2363. [Google Scholar] [CrossRef] [PubMed]
[20] Huang, W., Wang, G., Shen, Q., et al. (2015) ASBench: Benchmarking Sets for Allosteric Discovery. Bioinformatics, 31, 2598-2600. [Google Scholar] [CrossRef] [PubMed]
[21] Shen, Q., Wang, G., Li, S., et al. (2016) Asd v3.0: Unraveling Allosteric Regulation with Structural Mechanisms and Biological Networks. Nucleic Acids Research, 44, D527-D535. [Google Scholar] [CrossRef] [PubMed]
[22] Kabsch, W. and Sander, C. (1983) Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers, 22, 2577-2637. [Google Scholar] [CrossRef] [PubMed]
[23] Mihel, J., Sikic, M., Tomic, S., et al. (2008) Psaia-Protein Structure and Interaction Analyzer. BMC Structural Biology, 8, Article No. 21. [Google Scholar] [CrossRef] [PubMed]
[24] Wolpert, D.H. (1992) Stacked Generalization. Neural Networks, 5, 241-259. [Google Scholar] [CrossRef
[25] Cherkassky, V. (1997) The Nature of Statistical Learning Theory. IEEE Transactions on Neural Networks, 8, 1564. [Google Scholar] [CrossRef
[26] Zhang, H. (2004) The Optimality of Naive Bayes. Proceedings FLAIRS, 2, 562-567.
[27] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. [Google Scholar] [CrossRef
[28] Kleinbaum, D.G. and Klein, M. (2010) Logistic Regression. Springer, New York. [Google Scholar] [CrossRef
[29] Cover, T.M.T. (1968) Rates of Convergence for Nearest Neighbor Procedures. Proceedings of the Hawaii International Conference on System Sciences, Honolulu, 29-30 January 1968, 413-415.
[30] Freund, Y. and Schapire, R.E. (1997) A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139. [Google Scholar] [CrossRef
[31] Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[32] Friedman, J.H. (2002) Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38, 367-378. [Google Scholar] [CrossRef
[33] Scheepstra, M., Leysen, S., van Almen, G.C., et al. (2015) Identification of an Allosteric Binding Site for Rorgammat Inhibition. Nature Communications, 6, Article No. 8833. [Google Scholar] [CrossRef] [PubMed]
[34] Bagautdinov, B., Kuroishi, C., Sugahara, M., et al. (2005) Crystal Structures of Biotin Protein Ligase from Pyrococcus horikoshii ot3 and Its Complexes: Structural Basis of Biotin Activation. Journal of Molecular Biology, 353, 322-333. [Google Scholar] [CrossRef] [PubMed]