抗乳腺癌候选药物的生物活性预测建模
Predictive Modeling of Bioactivity of Anti-Breast Cancer Drug Candidates
摘要: 本文构建化合物对Erα生物活性的定量预测模型,结合集成学习方法与逻辑回归方法,使用自动的参数优化方法,使各算法达到最优泛化性能。首先使用随机森林算法,以信息理论为基础,将化合物的分子描述符对雌激素受体α亚型的活性影响进行特征重要性排序,得到可用于算法判断的20个高效变量;再根据这20个高效分子描述符,利用岭回归算法实现对ERα生物活性的定量预测。结果表明,该模型可以准确预测Erα生物活性,为科学选择抗乳腺癌药物提供了新思路。
Abstract: In this paper, a quantitative prediction model for ERα biological activity of compounds was con-structed, combined with integrated learning method and logistic regression method, and automatic parameter optimization method was used to achieve the optimal generalization performance of each algorithm. First, based on information theory, random forest algorithm was used to rank the characteristic importance of the effects of molecular descriptors of compounds on the activity of es-trogen receptor α subtypes, and 20 efficient variables were obtained. Based on these 20 molecular descriptors, ridge regression algorithm was used to quantitatively predict the biological activity of ERα. The results show that this model can accurately predict the biological activity of Erα, which provides a new idea for scientific selection of anti-breast cancer drugs.
文章引用:潘伟民. 抗乳腺癌候选药物的生物活性预测建模[J]. 建模与仿真, 2023, 12(3): 1820-1828. https://doi.org/10.12677/MOS.2023.123168

参考文献

[1] 蒲星月, 马原, 钟志刚. 2006-2020年中国女性乳腺癌死亡趋势分析——基于年龄-时期-出生队列模型[J]. 卫生经济研究, 2023, 40(2): 28-33.
[2] 刘训德. 雌激素受体α基因XbaI和PvuII多态性与乳腺癌及其不同分子亚型易感性的关系[D]: [硕士学位论文]. 遵义: 遵义医科大学, 2019.
[3] Zhang, X.M., Wang, Y.Z., Li, X., Wu, J., Zhao, L.W., Li, W. and Liu, J. (2021) Dynamics-Based Discovery of Novel, Potent Benzoic Acid Derivatives as Orally Bioavailable Selective Estrogen Receptor Degraders for ERα+ Breast Cancer. Journal of Medicinal Chemistry, 64, 7575-7595. [Google Scholar] [CrossRef] [PubMed]
[4] Alhammad, R. (2022) Bioinformatics Identification of TUBB as Po-tential Prognostic Biomarker for Worse Prognosis in ERα-Positive and Better Prognosis in ERα-Negative Breast Cancer. Diag-nostics, 12, 2067. [Google Scholar] [CrossRef] [PubMed]
[5] 宋述芳, 何入洋. 基于随机森林的重要性测度指标体系[J]. 国防科技大学学报, 2021, 43(2): 25-32.
[6] 马骊. 随机森林算法的优化改进研究[D]: [硕士学位论文]. 广州: 暨南大学, 2016.
[7] 黄梅, 朱焱. 基于随机森林特征重要性的 K-匿名特征优选[J]. 计算机应用与软件, 2020, 37(3): 266-270.
[8] Shi, T. and Horvath, S. (2006) Unsupervised Learning with Random Forest Predictors. Journal of Computa-tional & Graphical Statistics, 15, 118-138. [Google Scholar] [CrossRef
[9] 于玲, 吴铁军. 集成学习: Boosting算法综述[J]. 模式识别与人工智能, 2004(1): 52-59.
[10] 简治平. 基于集成学习的特征选择及稳定性分析[D]: [硕士学位论文]. 广州: 中山大学, 2010.
[11] 李炜. 机器学习概述[J]. 科技视界, 2017(12): 149.
[12] 吴冲, 潘启树, 李汉铃. 模糊线性回归预测[J]. 西安交通大学学报, 2000, 34(9): 100-102