基于多模态对比扩散的抗菌肽生成模型
An Antimicrobial Peptide Generation Model Based on Multimodal Contrastive Diffusion
摘要: 目的:构建一种融合序列、结构与表面信息的抗菌肽生成模型,以提高候选肽的新颖性、稳定性和功能相关性。方法:整合公开抗菌肽数据库资源,经序列标准化、去冗余、结构补全和质量控制后构建20104条抗菌肽样本,并配对构建等量非抗菌肽样本。提出基于多模态对比扩散的抗菌肽生成模型SSSCD (Sequence-Structure-Surface Contrastive Diffusion),将20维氨基酸独热编码、替换矩阵与疏水性特征组成序列表征,将主链坐标、AAindex理化描述符和残基层面表面几何拓扑特征组成结构表征;序列分支采用Transformer进行离散扩散去噪,结构分支采用等变图神经网络进行连续坐标去噪,并引入跨模态对比学习和能量引导采样。结果:SSSCD在Similarity、Instability、Antimicrobial和Docking score上分别达到24.93、40.05、0.87和1740,整体优于LSTM-RNN、AMPGAN、HydrAMP和DiffAB等模型。结论:SSSCD能够在统一框架下联合建模抗菌肽序列语义、三维骨架和表面特征,提升结构合理且功能相关的抗菌肽候选物的生成,为抗菌肽的筛选与后续研究提供了一种可行的计算策略。
Abstract: Objective: To develop an antimicrobial peptide generation model based on multimodal contrastive diffusion by integrating peptide sequence, backbone structure and residue-level surface descriptors, thereby improving the novelty, stability and functional relevance of generated candidates. Methods: Public antimicrobial peptide resources were integrated and subjected to sequence normalization, redundancy removal, structural completion and quality control. A dataset containing 20,104 antimicrobial peptide samples was constructed, together with an equal number of non-antimicrobial peptide samples. A multimodal contrastive diffusion model termed SSSCD (Sequence-Structure-Surface Contrastive Diffusion) was proposed. Sequence representations were constructed using amino acid one-hot encodings, substitution matrices and hydrophobicity-related features, whereas structural representations incorporated backbone coordinates, AAindex physicochemical descriptors and residue-level surface geometric-topological features. A Transformer-based discrete diffusion branch was employed for sequence denoising, while an equivariant graph neural network was used for continuous coordinate denoising. Cross-modal contrastive learning and energy-guided sampling were further introduced to improve sequence-structure consistency and local geometric rationality. Results: SSSCD achieved Similarity, Instability, Antimicrobial and Docking scores of 24.93, 40.05, 0.87 and 1740, respectively, outperforming LSTM-RNN, AMPGAN, HydrAMP and DiffAB. Conclusion: SSSCD effectively integrates sequence semantics, backbone geometry and surface-level descriptors within a unified contrastive diffusion framework. The model improves the generation of structurally plausible and functionally relevant antimicrobial peptide candidates, providing a feasible computational strategy for antimicrobial peptide generation and downstream screening.
文章引用:祁志霄, 廖俊. 基于多模态对比扩散的抗菌肽生成模型[J]. 计算生物学, 2026, 16(2): 79-88. https://doi.org/10.12677/hjcb.2026.162007

参考文献

[1] Wang, G., Li, X. and Wang, Z. (2016) APD3: The Antimicrobial Peptide Database as a Tool for Research and Education. Nucleic Acids Research, 44, D1087-D1093. [Google Scholar] [CrossRef] [PubMed]
[2] Waghu, F.H., Barai, R.S., Gurung, P. and Idicula-Thomas, S. (2016) CAMPr3: A Database on Sequences, Structures and Signatures of Antimicrobial Peptides: Table 1. Nucleic Acids Research, 44, D1094-D1097. [Google Scholar] [CrossRef] [PubMed]
[3] Shi, G., Kang, X., Dong, F., Liu, Y., Zhu, N., Hu, Y., et al. (2022) DRAMP 3.0: An Enhanced Comprehensive Data Repository of Antimicrobial Peptides. Nucleic Acids Research, 50, D488-D496. [Google Scholar] [CrossRef] [PubMed]
[4] Van Oort, C.M., Ferrell, J.B., Remington, J.M., Wshah, S. and Li, J. (2021) AMPGAN V2: Machine Learning-Guided Design of Antimicrobial Peptides. Journal of Chemical Information and Modeling, 61, 2198-2207. [Google Scholar] [CrossRef] [PubMed]
[5] Ho, J., Jain, A. and Abbeel, P. (2020) Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33, 6840-6851.
[6] Li, S., Fan, J., He, H., Zhou, R. and Liao, J. (2025) MolP-PC: A Multi-View Fusion and Multi-Task Learning Framework for Drug ADMET Property Prediction. Chinese Journal of Natural Medicines, 23, 1293-1300. [Google Scholar] [CrossRef
[7] Satorras, V.G., Hoogeboom, E. and Welling, M. (2021) E(n) Equivariant Graph Neural Networks. Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18-24 July 2021, 9323-9332.
[8] Yao, L., Guan, J., Xie, P., Chung, C., Zhao, Z., Dong, D., et al. (2025) dbAMP 3.0: Updated Resource of Antimicrobial Activity and Structural Annotation of Peptides in the Post-Pandemic Era. Nucleic Acids Research, 53, D364-D376. [Google Scholar] [CrossRef] [PubMed]
[9] Bateman, A., Martin, M., Orchard, S., Magrane, M., Ahmad, S., Alpi, E., et al. (2023) UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51, D523-D531. [Google Scholar] [CrossRef] [PubMed]
[10] Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., et al. (2024) Accurate Structure Prediction of Biomolecular Interactions with Alphafold 3. Nature, 630, 493-500. [Google Scholar] [CrossRef] [PubMed]
[11] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems, 5998-6008.
[12] Kyte, J. and Doolittle, R.F. (1982) A Simple Method for Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology, 157, 105-132. [Google Scholar] [CrossRef] [PubMed]
[13] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., et al. (2009) Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics, 25, 1422-1423. [Google Scholar] [CrossRef] [PubMed]
[14] Chen, R., Li, L. and Weng, Z. (2003) ZDOCK: An Initial‐Stage Protein‐Docking Algorithm. Proteins: Structure, Function, and Bioinformatics, 52, 80-87. [Google Scholar] [CrossRef] [PubMed]
[15] Li, Y., Orlando, B.J. and Liao, M. (2019) Structural Basis of Lipopolysaccharide Extraction by the LptB2FGC Complex. Nature, 567, 486-490. [Google Scholar] [CrossRef] [PubMed]
[16] Szymczak, P., Możejko, M., Grzegorzek, T., Jurczak, R., Bauer, M., Neubauer, D., et al. (2023) Discovering Highly Potent Antimicrobial Peptides with Deep Generative Model HydrAMP. Nature Communications, 14, Article No. 1453. [Google Scholar] [CrossRef] [PubMed]
[17] Luo, S., Ma, J., Peng, J., Peng, X., Su, Y. and Wang, S. (2022) Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures. Advances in Neural Information Processing Systems, 35, 9754-9767. [Google Scholar] [CrossRef
[18] Van der Maaten, L. and Hinton, G. (2008) Visualizing Data Using t-SNE. Journal of Machine Learning Research, 9, 2579-2605.