深度神经网络筛选蛋白质组学高置信度定量肽段
A Method for Analyzing DIA-NN Output Peptides Based on Squeeze-and-Excitation Neural Network
DOI: 10.12677/BIPHY.2023.112002, PDF,  被引量    国家自然科学基金支持
作者: 郭 欢, 黎玉林:厦门大学物理系,福建 厦门;帅建伟:厦门大学物理系,福建 厦门;中国科学院大学,国科温州研究院,浙江 温州
关键词: 深度学习DIA蛋白质组学质谱数据人工筛选压缩激励神经网络Deep Learning Data-Independent-Acquisition Mass Spectrometry Data Manual Filtering Squeeze-and-Excitation Networks
摘要: 质谱分析是蛋白质组学的重要研究方法。数据不依赖获取是一种稳定且复现性高的质谱仪数据采集方式,具有质荷比范围宽广,通量高等特点。DIA-NN是处理DIA蛋白质组学数据的主流定量软件之一。由于DIA-NN分析DIA数据后输出的肽段中存在低置信度肽段,生物学家需要根据肽段碎片离子色谱峰组图(XICs)的相似性来人工筛选出高置信度肽段。人工筛选的任务量大、耗时长,并且筛选标准因人而异,这导致结果具有主观性。本文提出了一种名为MSDeepFilter的算法,它基于深度学习技术,能够自动筛选出高置信度的肽段。MSDeepFilter算法结合压缩激励神经网络和残差网络设计深度学习模型,从XICs中提取特征,以此区分高置信度和低置信度肽段。与传统机器学习模型Adaboosting和支持向量机模型相比,MSDeepFilter模型在基准数据集上的多项分类性能指标均表现更优,测试集AUC值达到了98.7%。这表明MSDeepFilter具有优秀性能,可以替代人工筛选的环节。
Abstract: Mass spectrometry is an important analytical method of proteomics. Data-Independent acquisition (DIA) is a stable and highly reproducible data acquisition method of mass spectrometer, which has the characteristics of wide range of mass to charge ratio and high throughput. DIA-NN is one of the mainstream quantitative software based on deep learning in the field of DIA proteomics data pro-cessing. The output of DIA-NN analysis of DIA data contains low confidence peptides, so biologists need to manually filter out high confidence peptides based on the similarity of peptide fragment ion chromatogram peak profiles (XICs). The task of manual filter is time-consuming, and the filter crite-ria vary from person to person, leading to subjective results. In this work, we propose an algorithm MSDeepFilter that can automatically filter out high-confidence peptides based on deep learning. The algorithm extracts the features of XICs by a deep learning model designed based on Squeeze-and-Excitation Networks and Residual networks as a way to distinguish high confidence peptides from low confidence peptides. Compared with the traditional machine learning models Adaboosting and Support Vector Machine models, the MSDeepFilter model performs better in sev-eral classification performance metrics on the benchmark dataset, with a test set AUC value of 98.7%. This indicates that MSDeepFilter has excellent performance and can replace the manual fil-tering process.
文章引用:郭欢, 何情祖, 黎玉林, 帅建伟. 深度神经网络筛选蛋白质组学高置信度定量肽段[J]. 生物物理学, 2023, 11(2): 17-29. https://doi.org/10.12677/BIPHY.2023.112002

参考文献

[1] Zhang, Y., Fonslow, B.R., Shan, B., Baek, M.-C. and Yates, J.R. (2013) Protein Analysis by Shotgun/Bottom-Up Pro-teomics. Chemical Reviews, 113, 2343-2394. [Google Scholar] [CrossRef] [PubMed]
[2] Li, X., Zhong, C.Q., Wu, R., Xu, X., Yang, Z.H., Cai, S., et al. (2021) RIP1-Dependent Linear and Nonlinear Recruitments of Caspase-8 and RIP3 Respectively to Necrosome Specify Distinct Cell Death Outcomes. Protein & Cell, 12, 858-876. [Google Scholar] [CrossRef] [PubMed]
[3] Li, X., Zhong, C.Q., Yin, Z., Qi, H., Xu, F., He, Q. and Shuai, J. (2020) Data-Driven Modeling Identifies TIRAP-Independent MyD88 Activation Complex and myddosome Assembly Strategy in LPS/TLR4 Signaling. International Journal of Molecular Sciences, 21, Article 3061. [Google Scholar] [CrossRef] [PubMed]
[4] Zhong, C.Q., Wu, R., Chen, X., Wu, S., Shuai, J. and Han, J. (2019) Systematic Assessment of the Effect of Internal Library in Targeted Analysis of SWATH-MS. Journal of Proteome Re-search, 19, 477-492. [Google Scholar] [CrossRef] [PubMed]
[5] Röst, H.L., Rosenberger, G., Navarro, P., Gillet, L., Mila-dinović, S.M., Schubert, O.T., Wolski, W., Collins, B.C., Malmström, J., Malmström, L. and Aebersold, R. (2014) OpenSWATH Enables Automated, Targeted Analysis of Data-Independent Acquisition MS Data. Nature Biotechnology, 32, 219-223. [Google Scholar] [CrossRef] [PubMed]
[6] Keller, A., Bader, S.L., Shteynberg, D., Hood, L. and Moritz, R.L. (2015) Automated Validation of Results and Removal of Fragment Ion Interferences in Targeted Analysis of Da-ta-independent Acquisition Mass Spectrometry (MS) Using SWATHProphet. Molecular & Cellular Proteomics, 14, 1411-1418. [Google Scholar] [CrossRef
[7] Peckner, R., Myers, S.A., Jacome, A. S.V., Egertson, J.D., Abelin, J.G., MacCoss, M.J., Carr, S.A. and Jaffe, J.D. (2018) Specter: Linear Deconvolution for Targeted Analysis of Da-ta-Independent Acquisition Mass Spectrometry Proteomics. Nature Methods, 15, 371-378. [Google Scholar] [CrossRef] [PubMed]
[8] Tsou, C.C., Avtonomov, D., Larsen, B., Tucholska, M., Choi, H., Gingras, A.C. and Nesvizhskii, A.I. (2015) DIA-Umpire: Comprehensive Computational Framework for Data-Independent Ac-quisition Proteomics. Nature Methods, 12, 258-264. [Google Scholar] [CrossRef] [PubMed]
[9] Li, Y., Zhong, C.Q., Xu, X., Cai, S., Wu, X., Zhang, Y., et al. (2015) Group-DIA: Analyzing Multiple Data-Independent Acquisition Mass Spectrometry Data Files. Nature Methods, 12, 1105-1106. [Google Scholar] [CrossRef] [PubMed]
[10] Meyer, J.G., Mukkamalla, S., Steen, H., Nesvizhskii, A.I., Gibson, B.W. and Schilling, B. (2017) PIQED: Automated Identification and Quantification of Protein Modifications from DIA-MS Data. Nature Methods, 14, 646-647. [Google Scholar] [CrossRef] [PubMed]
[11] Bruderer, R., Bernhardt, O.M., Gandhi, T., Miladinović, S.M., Cheng, L.Y., Messner, S., et al. (2015) Extending the Limits of Quantitative Proteome Profiling with Data-Independent Acquisi-tion and Application to Acetaminophen-Treated Three-Dimensional Liver Microtissues. Molecular & Cellular Prote-omics, 14, 1400-1410. [Google Scholar] [CrossRef
[12] Gessulat, S., Schmidt, T., Zolg, D.P., Samaras, P., Schnatbaum, K., Zerweck, J., et al. (2019) Prosit: Proteome-Wide Prediction of Peptide Tandem Mass Spectra by Deep Learning. Nature Methods, 16, 509-518. [Google Scholar] [CrossRef] [PubMed]
[13] Tran, N.H., Qiao, R., Xin, L., Chen, X., Liu, C., Zhang, X., Shan, B., Ghodsi, A. and Li, M. (2019) Deep Learning Enables de Novo Peptide Sequencing from da-ta-Independent-Acquisition Mass Spectrometry. Nature Methods, 16, 63-66. [Google Scholar] [CrossRef] [PubMed]
[14] 何情祖, 钟传奇, 李翔, 帅建伟, 韩家淮. 数据不依赖获取的质谱数据的深度学习分析方法[J]. 厦门大学学报(自然科学版), 2021, 60(1): 97-103.
[15] He, Q., Zhong, C.Q., Li, X., Guo, H., Li, Y., Gao, M., et al. (2022) Dear-DIAXMBD: Deep Autoencoder for Data-Inde- pendent Acquisition Pro-teomics. (Preprint) [Google Scholar] [CrossRef
[16] Gao, M., Yang, W., Li, C., Chang, Y., Liu, Y., He, Q., Zhong, C.-Q., Shuai, J., Yu, R. and Han, J. (2021) Deep Representation Features from DreamDIAXMBD Improve the Analysis of Data-Independent Acquisition Proteomics. Communications Biology, 4, Article No. 1190. [Google Scholar] [CrossRef] [PubMed]
[17] Demichev, V., Messner, C.B., Vernardis, S.I., Lilley, K.S. and Ralser, M. (2020) DIA-NN: Neural Networks and Interference Correction Enable Deep Proteome Coverage in High Throughput. Nature Methods, 17, 41-44. [Google Scholar] [CrossRef] [PubMed]
[18] MacLean, B., Tomazela, D.M., Shulman, N., Chambers, M., Fin-ney, G.L., Frewen, B., et al. (2010) Skyline: An Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments. Bioinformatics, 26, 966-968. [Google Scholar] [CrossRef] [PubMed]
[19] Sturm, M. and Kohlbacher, O. (2009) TOPPView: An Open-Source Viewer for Mass Spectrometry Data. Journal of proteome research, 8, 3760-3763. [Google Scholar] [CrossRef] [PubMed]
[20] Li, Y., He, Q., Guo, H., Zhong, C.Q., Li, X., Li, Y., Han, J. and Shuai, J. (2022) MSSort-DIAXMBD: A Deep Learning Classification Tool of the Peptide Precursors Quantified by OpenSWATH. Journal of Proteomics, 259, Article ID: 104542. [Google Scholar] [CrossRef] [PubMed]
[21] Gupta, S., Sing, J., Mahmoodi, A. and Röst, H. (2020) DrawAlignR: An Interactive Tool for across Run Chromatogram Alignment Vis-ualization. Proteomics, 20, Article ID: 1900353. [Google Scholar] [CrossRef] [PubMed]
[22] Tatjana, V., Domitille, S. and Jean-Charles, S. (2021) Paraquat-Induced Cholesterol Biosynthesis Proteins Dysregulation in Human Brain Microvascular Endothelial Cells. Scientific Reports, 11, Article No. 18137. [Google Scholar] [CrossRef] [PubMed]
[23] Midha, M.K., Kusebauch, U., Shteynberg, D., Kapil, C., Bader, S.L., Reddy, P.J., et al. (2020) A Comprehensive Spectral Assay Library to Quantify the Escherichia Coli Proteome by DIA/SWATH-MS. Scientific Data, 7, Article No. 389. [Google Scholar] [CrossRef] [PubMed]
[24] Navarro, P., Kuharev, J., Gillet, L.C., Bernhardt, O.M., MacLean, B., Röst, H.L., et al. (2016) A Multicenter Study Benchmarks Software Tools for Label-Free Proteome Quantification. Nature Biotechnology, 34, 1130-1136. [Google Scholar] [CrossRef] [PubMed]
[25] Muntel, J., Gandhi, T., Verbeke, L., Bernhardt, O.M., Treiber, T., Bruderer, R. and Reiter, L. (2019) Surpassing 10000 Identified and Quantified Proteins in a Single Run by Optimizing Current LC-MS Instrumentation and Data Analysis Strategy. Molecular Omics, 15, 348-360. [Google Scholar] [CrossRef
[26] Muntel, J., Kirkpatrick, J., Bruderer, R., Huang, T., Vitek, O., Ori, A. and Reiter, L. (2019) Comparison of Protein Quantification in a Complex Background by DIA and TMT Workflows with Fixed Instrument Time. Journal of Proteome Research, 18, 1340-1351. [Google Scholar] [CrossRef] [PubMed]
[27] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[28] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[29] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 5998-6008.
[30] Freund, Y. and Schapire, R.E. (1997) A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139. [Google Scholar] [CrossRef
[31] Chen, P.H., Lin, C.J. and Schölkopf, B. (2005) A Tutorial on ν-Support Vector Machines. Applied Stochastic Models in Business and Industry, 21, 111-136. [Google Scholar] [CrossRef