单细胞数据库建设的研究进展
Research Progress on Single-Cell Database Construction
DOI: 10.12677/BIPHY.2023.112003, PDF,    国家自然科学基金支持
作者: 陈玲玲, 程 烽, 李 翔:厦门大学物理科学与技术学院,福建 厦门;胡 桓, 徐 飞:厦门大学物理科学与技术学院,福建 厦门;中国科学院大学温州研究院,浙江 温州;林 海:中国科学院大学温州研究院,浙江 温州
关键词: scRNA-seq数据库单细胞分析标记基因COVID-19Single-Cell RNA Sequencing Database Single-Cell Analysis Marker Gene COVID-19
摘要: 近年来,随着以单细胞转录组测序(Single-cell RNA sequencing, scRNA-seq)技术为重点的大规模生物学实验的兴起,研究人员可以在细胞水平上展开更加深入的研究。基于scRNA-seq技术的优势,尤其是其对研究细胞异质性的能力,越来越多的单细胞数据库涌现出来,为疾病的发生和治疗提供了研究基础,特别是对于复杂的癌症和当前难以完全解决的COVID-19问题。随着scRNA-seq技术的不断发展,单细胞数据库也在不断完善和扩大,涵盖越来越多的物种数据信息,同时提供多种分析功能,为单细胞研究提供了便利。本文回顾了目前广泛使用的单细胞数据库,并对其数据量和数据类型等做了概括总结。此外,我们还调查了研究人员在数据分析方面的使用情况,并得出了单细胞数据库建设的最新进展。最后,本文还针对目前单细胞数据库存在的局限性提出了一些改进建议。
Abstract: In recent years, with the rise of large-scale biological experiments that focus on single-cell RNA se-quencing (scRNA-seq) technology, researchers can conduct more in-depth studies at the cellular level. Based on the advantages of scRNA-seq technology, particularly its ability to study cell hetero-geneity, an increasing number of single-cell databases have emerged, providing a research founda-tion for the occurrence and treatment of diseases, especially for complex cancers and the currently unsolved COVID-19 problem. As scRNA-seq technology continues to develop, single-cell databases are also constantly improving and expanding, covering more and more species data information, while providing multiple analysis functions, facilitating single-cell research. This article reviews currently widely used single-cell databases and summarizes their data volume and data types. In addition, we investigated the usage of researchers in data analysis and obtained the latest progress in the construction of single-cell databases. Finally, this article proposes some improvement sug-gestions for the limitations of current single-cell databases.
文章引用:陈玲玲, 程烽, 胡桓, 徐飞, 李翔, 林海. 单细胞数据库建设的研究进展[J]. 生物物理学, 2023, 11(2): 30-43. https://doi.org/10.12677/BIPHY.2023.112003

参考文献

[1] Hwang, B., Lee, J.H. and Bang, D. (2018) Single-Cell RNA Sequencing Technologies and Bioinformatics Pipelines. Experi-mental & Molecular Medicine, 50, 1-14. [Google Scholar] [CrossRef] [PubMed]
[2] Buettner, F., et al. (2015) Computational Analysis of Cell-to-Cell Heterogeneity in Single-Cell RNA-Sequencing Data Reveals Hidden Subpopulations of Cells. Nature Biotechnology, 33, 155-160. [Google Scholar] [CrossRef] [PubMed]
[3] Hu, H., et al. (2022) CITEMOXMBD: A Flexible Single-Cell Multimodal Omics Analysis Framework to Reveal the Heterogeneity of Immune Cells. RNA Biology, 19, 290-304. [Google Scholar] [CrossRef] [PubMed]
[4] Shimizu, H. and Nakayama, K.I. (2020) Artificial Intelli-gence in Oncology. Cancer Science, 111, 1452-1460. [Google Scholar] [CrossRef] [PubMed]
[5] Kaufmann, S.H.E. (2019) Immunology’s Coming of Age. Frontiers in Immunolo-gy, 10, Article 684. [Google Scholar] [CrossRef] [PubMed]
[6] Wu, W., et al. (2022) Exploring the Cellular Landscape of Circular RNAs Using Full-Length Single-Cell RNA Sequencing. Nature Communications, 13, Article No. 3242. [Google Scholar] [CrossRef] [PubMed]
[7] Kim, D., et al. (2020) The Architecture of SARS-CoV-2 Transcriptome. Cell, 181, 914-921. [Google Scholar] [CrossRef] [PubMed]
[8] Mathys, H., et al. (2019) Single-Cell Transcriptomic Analysis of Alz-heimer’s Disease. Nature, 570, 332-337. [Google Scholar] [CrossRef] [PubMed]
[9] Giustacchini, A., et al. (2017) Single-Cell Transcriptomics Uncovers Distinct Molecular Signatures of Stem Cells in Chronic Myeloid Leukemia. Nature Medicine, 23, 692-702. [Google Scholar] [CrossRef] [PubMed]
[10] Segerstolpe, A., et al. (2016) Single-Cell Transcriptome Profiling of Human Pancre-atic Islets in Health and Type 2 Diabetes. Cell Metabolism, 24, 593-607. [Google Scholar] [CrossRef] [PubMed]
[11] Whiteside, T.L. (2008) The Tumor Microenvironment and Its Role in Promoting Tumor Growth. Oncogene, 27, 5904- 5912. [Google Scholar] [CrossRef] [PubMed]
[12] Janakiraman, M., et al. (2010) Genomic and Biological Characterization of Exon 4 KRAS Mutations in Human Cancer. Cancer Research, 70, 5901-5911. [Google Scholar] [CrossRef
[13] Barretina, J., et al. (2012) The Cancer Cell Line Ency-clopedia Enables Predictive Modelling of Anticancer Drug Sensitivity. Nature, 483, 603-607. [Google Scholar] [CrossRef] [PubMed]
[14] Muus, C., et al. (2021) Single-Cell Meta-Analysis of SARS-CoV-2 Entry Genes across Tissues and Demographics. Nature Medicine, 27, 546-559. [Google Scholar] [CrossRef] [PubMed]
[15] Zhang, J.-Y., et al. (2020) Single-Cell Landscape of Immunological Responses in Patients with COVID-19. Nature Immunology, 21, 1107-1118. [Google Scholar] [CrossRef] [PubMed]
[16] Liao, M., et al. (2020) Single-Cell Landscape of Bronchoalveolar Im-mune Cells in Patients with COVID-19. Nature Medicine, 26, 842-844. [Google Scholar] [CrossRef] [PubMed]
[17] Wilk, A.J., et al. (2020) A Single-Cell Atlas of the Peripheral Immune Response in Patients with Severe COVID-19. Nature Medicine, 26, 1070-1076. [Google Scholar] [CrossRef] [PubMed]
[18] Hu, H., et al. (2023) Modeling and Analyzing Single-Cell Multimodal Data with Deep Parametric Inference. Briefings in Bioinformatics, 24, Article No. bbad005. [Google Scholar] [CrossRef] [PubMed]
[19] Qi, C., et al. (2022) SCovid: Single-Cell Atlases for Exposing Molecular Char-acteristics of COVID-19 across 10 Human Tissues. Nucleic Acids Research, 50, D867-D874. [Google Scholar] [CrossRef] [PubMed]
[20] Yuan, H., et al. (2019) CancerSEA: A Cancer Single-Cell State Atlas. Nucleic Acids Research, 47, D900-D908. [Google Scholar] [CrossRef] [PubMed]
[21] Natarajan, K.N., et al. (2019) Comparative Analysis of Sequencing Technologies for Single-Cell Transcriptomics. Genome Biology, 20, Article No. 70. [Google Scholar] [CrossRef] [PubMed]
[22] Hu, H., et al. (2023) Gene Function and Cell Surface Protein Association Analysis Based on Single-Cell Multiomics Data. Comput-ers in Biology and Medicine, 157, Article ID: 106733. [Google Scholar] [CrossRef] [PubMed]
[23] Zhang, X., et al. (2019) CellMarker: A Manually Curated Re-source of Cell Markers in Human and Mouse. Nucleic Acids Research, 47, D721-D728. [Google Scholar] [CrossRef] [PubMed]
[24] Hu, C., et al. (2023) CellMarker 2.0: An Updated Database of Manually Curated Cell Markers in Human/Mouse and Web Tools Based on scRNA-Seq Data. Nucleic Acids Research, 51, D870-D876. [Google Scholar] [CrossRef] [PubMed]
[25] Papatheodorou, I., et al. (2020) Expression Atlas Update: From Tissues to Single Cells. Nucleic Acids Research, 48, D77-D83.
[26] Zeng, J., et al. (2022) CancerSCEM: A Database of Single-Cell Expression Map across Various Human Cancers. Nucleic Acids Research, 50, D1147-D1155. [Google Scholar] [CrossRef] [PubMed]
[27] Wang, R., et al. (2023) Construction of a Cross-Species Cell Landscape at Sin-gle-Cell Level. Nucleic Acids Research, 51, 501-516. [Google Scholar] [CrossRef] [PubMed]
[28] Pan, L., et al. (2023) HTCA: A Database with an In-Depth Characterization of the Single-Cell Human Transcriptome. Nucleic Acids Research, 51, D1019-D1028. [Google Scholar] [CrossRef] [PubMed]
[29] Shi, X., et al. (2023) HUSCH: An Integrated Single-Cell Tran-scriptome Atlas for Human Tissue Gene Expression Visualization and Analyses. Nucleic Acids Research, 51, D1029-D1037. [Google Scholar] [CrossRef] [PubMed]
[30] Gao, X., et al. (2023) ABC Portal: A Single-Cell Database and Web Server for Blood Cells. Nucleic Acids Research, 51, D792-D804. [Google Scholar] [CrossRef] [PubMed]
[31] Han, X., et al. (2018) Mapping the Mouse Cell Atlas by Microwell-Seq. Cell, 172, 1091-1107. [Google Scholar] [CrossRef] [PubMed]
[32] Schaum, N., et al. (2018) Single-Cell Transcriptomics of 20 Mouse Organs Creates a Tabula Muris. Nature, 562, 367-372. [Google Scholar] [CrossRef] [PubMed]
[33] Tasic, B., et al. (2016) Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics. Nature Neuroscience, 19, 335-346. [Google Scholar] [CrossRef] [PubMed]
[34] Lake, B.B., et al. (2018) Integrative Single-Cell Analysis of Transcriptional and Epi-genetic States in the Human Adult Brain. Nature Biotechnology, 36, 70-80. [Google Scholar] [CrossRef] [PubMed]
[35] The Tabula Muris Consortium (2020) A Single-Cell Transcriptomic Atlas Characterizes Ageing Tissues in the Mouse. Nature, 583, 590-595. [Google Scholar] [CrossRef] [PubMed]
[36] Cao, Y., Zhu, J., Han, G., Jia, P. and Zhao, Z. (2017) ScRNASeqDB: A Database for Gene Expression Profiling in Human Single Cell by RNA-Seq. BioRxiv, Article ID: 104810. [Google Scholar] [CrossRef
[37] Franzén, O., Gan, L.-M. and Björkegren, J.L.M. (2019) PanglaoDB: A Web Server for Exploration of Mouse and Human Single-Cell RNA Sequencing Data. Database, 2019, Article No. baz046. [Google Scholar] [CrossRef] [PubMed]
[38] Pardi, N., Hogan, M.J., Porter, F.W. and Weissman, D. (2018) MRNA Vaccines—A New Era in Vaccinology. Nature Reviews Drug Discovery, 17, 261-279. [Google Scholar] [CrossRef] [PubMed]
[39] Liu, Y., Beyer, A. and Aebersold, R. (2016) On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell, 165, 535-550. [Google Scholar] [CrossRef] [PubMed]
[40] Wolf, F.A., Angerer, P. and Theis, F.J. (2018) SCANPY: Large-Scale Single-Cell Gene Expression Data Analysis. Genome Biology, 19, Article No. 15. [Google Scholar] [CrossRef] [PubMed]
[41] Becht, E., et al. (2019) Dimensionality Reduction for Visualizing Single-Cell Data Using UMAP. Nature Biotechnology, 37, 38-44. [Google Scholar] [CrossRef] [PubMed]
[42] Kodinariya, T.M. and Makwana, P.R. (2013) Review on Determining Number of Cluster in K-Means Clustering. International Journal, 1, 90-95.
[43] Murtagh, F. and Contreras, P. (2017) Algorithms for Hi-erarchical Clustering: An Overview, II. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7, e1219. [Google Scholar] [CrossRef
[44] Bhattacharjee, P. and Mitra, P. (2021) A Survey of Density Based Clustering Al-gorithms. Frontiers of Computer Science, 15, Article No. 151308. [Google Scholar] [CrossRef
[45] Schaeffer, S.E. (2007) Graph Clustering. Computer Science Review, 1, 27-64. [Google Scholar] [CrossRef
[46] Saelens, W., Cannoodt, R., Todorov, H. and Saeys, Y. (2019) A Com-parison of Single-Cell Trajectory Inference Methods. Nature Biotechnology, 37, 547-554. [Google Scholar] [CrossRef] [PubMed]
[47] Peng, L., et al. (2022) Cell-Cell Communication Inference and Analysis in the Tumour Microenvironments from Single-Cell Transcriptomics: Data Resources and Computational Strategies. Briefings in Bioinformatics, 23, Article No. bbac234. [Google Scholar] [CrossRef] [PubMed]
[48] Ji, Q., et al. (2019) Single-Cell RNA-Seq Analysis Reveals the Progression of Human Osteoarthritis. Annals of the Rheumatic Diseases, 78, 100-110. [Google Scholar] [CrossRef] [PubMed]
[49] Scialdone, A., et al. (2015) Computational Assignment of Cell-Cycle Stage from Single-Cell Transcriptome Data. Methods, 85, 54-61. [Google Scholar] [CrossRef] [PubMed]
[50] Zheng, G.X.Y., et al. (2017) Massively Parallel Digital Transcriptional Profiling of Single Cells. Nature Communications, 8, Article ID: 14049.
[51] Subramanian, A., et al. (2005) Gene Set En-richment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proceedings of the Na-tional Academy of Sciences of the United States of America, 102, 15545-15550.
https://pubmed.ncbi.nlm.nih.gov/16199517/
[52] Butler, A., Hoffman, P., Smibert, P., Papalexi, E. and Satija, R. (2018) Integrating Single-Cell Transcriptomic Data across Different Conditions, Technologies, and Species. Nature Biotechnology, 36, 411-420. [Google Scholar] [CrossRef] [PubMed]
[53] Trapnell, C., et al. (2014) The Dynamics and Regulators of Cell Fate Decisions Are Revealed by Pseudotemporal Ordering of Single Cells. Nature Biotechnology, 32, 381-386. [Google Scholar] [CrossRef] [PubMed]
[54] McCarthy, D.J., Campbell, K.R., Lun, A.T. and Wills, Q.F. (2017) Scater: Pre-Processing, Quality Control, Normalization and Visualization of Single-Cell RNA-Seq Data in R. Bioinformatics, 33, 1179-1186. [Google Scholar] [CrossRef] [PubMed]
[55] Ji, Z. and Ji, H. (2016) TSCAN: Pseudo-Time Reconstruction and Evaluation in Single-Cell RNA-Seq Analysis. Nucleic Acids Research, 44, e117. [Google Scholar] [CrossRef] [PubMed]
[56] Li, H., et al. (2017) Reference Component Analysis of Single-Cell Transcrip-tomes Elucidates Cellular Heterogeneity in Human Colorectal Tumors. Nature Genetics, 49, 708-718. [Google Scholar] [CrossRef] [PubMed]
[57] Zhao, T., et al. (2021) SC2disease: A Manually Curated Database of Single-Cell Transcriptome for Human Diseases. Nucleic Acids Research, 49, D1413-D1419. [Google Scholar] [CrossRef] [PubMed]
[58] Yu, G., Wang, L.-G., Han, Y. and He, Q.-Y. (2012) ClusterProfiler: An R Package for Comparing Biological Themes among Gene Clusters. OMICS: A Journal of Integrative Biology, 16, 284-287. [Google Scholar] [CrossRef] [PubMed]
[59] Shannon, P., et al. (2003) Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research, 13, 2498-2504. [Google Scholar] [CrossRef] [PubMed]
[60] Szklarczyk, D., et al. (2019) STRING v11: Protein-Protein Association Net-works with Increased Coverage, Supporting Functional Discovery in Genome-Wide Experimental Datasets. Nucleic Acids Re-search, 47, D607-D613. [Google Scholar] [CrossRef] [PubMed]
[61] Love, M.I., Huber, W. and Anders, S. (2014) Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2. Genome Biology, 15, Article No. 550. [Google Scholar] [CrossRef] [PubMed]
[62] Wilkerson, M.D. and Hayes, D.N. (2010) ConsensusClusterPlus: A Class Discovery Tool with Confidence Assessments and Item Tracking. Bioinformatics, 26, 1572-1573. [Google Scholar] [CrossRef] [PubMed]
[63] Storey, J.D. and Tibshirani, R. (2003) Statistical Significance for Ge-nomewide Studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440-9445. [Google Scholar] [CrossRef] [PubMed]
[64] Jassal, B., et al. (2020) The Reactome Pathway Knowledgebase. Nucleic Acids Research, 48, D498-D503.
[65] Xia, J., Benner, M.J. and Hancock, R.E. (2014) NetworkAnalyst—Integrative Ap-proaches for Protein-Protein Interaction Network Analysis and Visual Exploration. Nucleic Acids Research, 42, W167-W174. [Google Scholar] [CrossRef] [PubMed]
[66] Carpenter, A.E., et al. (2006) CellProfiler: Image Analysis Software for Identify-ing and Quantifying Cell Phenotypes. Genome Biology, 7, Article No. R100. [Google Scholar] [CrossRef] [PubMed]
[67] Kuleshov, M.V., et al. (2016) Enrichr: A Comprehensive Gene Set En-richment Analysis Web Server 2016 Update. Nucleic Acids Research, 44, W90-W97. [Google Scholar] [CrossRef] [PubMed]
[68] Chatr-Aryamontri, A., et al. (2017) The BioGRID Interaction Database: 2017 Update. Nucleic Acids Research, 45, D369-D379. [Google Scholar] [CrossRef] [PubMed]
[69] Korsunsky, I., et al. (2019) Fast, Sensitive and Accurate Integration of Single-Cell Data with Harmony. Nature Methods, 16, 1289-1296. [Google Scholar] [CrossRef] [PubMed]
[70] DePasquale, E.A.K., et al. (2019) DoubletDecon: Deconvoluting Dou-blets from Single-Cell RNA-Sequencing Data. Cell Reports, 29, 1718-1727. [Google Scholar] [CrossRef] [PubMed]
[71] Aran, D., et al. (2019) Reference-Based Analysis of Lung Single-Cell Sequencing Reveals a Transitional Profibrotic Macrophage. Nature Immunology, 20, 163-172. [Google Scholar] [CrossRef] [PubMed]
[72] Li, Y., Shi, W. and Wasserman, W.W. (2018) Genome-Wide Prediction of Cis-Regulatory Regions Using Supervised Deep Learning Methods. BMC Bioinformatics, 19, Article No. 202. [Google Scholar] [CrossRef] [PubMed]
[73] Gu, Z., Eils, R. and Schlesner, M. (2016) Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data. Bioinformatics, 32, 2847-2849. [Google Scholar] [CrossRef] [PubMed]
[74] Rousseeuw, P.J. (1987) Silhouettes: A Graphical Aid to the Interpreta-tion and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20, 53-65. [Google Scholar] [CrossRef
[75] Jin, S., et al. (2021) Inference and Analysis of Cell-Cell Communica-tion Using CellChat. Nature Communications, 12, Article No. 1088. [Google Scholar] [CrossRef] [PubMed]
[76] Kiselev, V.Y., Yiu, A. and Hemberg, M. (2018) Scmap: Projection of Single-Cell RNA-Seq Data across Data Sets. Nature Methods, 15, 359-362. [Google Scholar] [CrossRef] [PubMed]
[77] Efremova, M., Vento-Tormo, M., Teichmann, S.A. and Vento-Tormo, R. (2020) CellPhoneDB: Inferring Cell-Cell Communication from Combined Expression of Multi-Subunit Ligand-Receptor Complexes. Nature Protocols, 15, 1484-1506. [Google Scholar] [CrossRef] [PubMed]
[78] Baron, M., et al. (2016) A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Systems, 3, 346-360. [Google Scholar] [CrossRef] [PubMed]
[79] Moerman, T., et al. (2019) GRNBoost2 and Arboreto: Efficient and Scalable Inference of Gene Regulatory Networks. Bioinformatics, 35, 2159-2161. [Google Scholar] [CrossRef] [PubMed]
[80] Ghahremani, P., et al. (2021) NeuroConstruct: 3D Reconstruction and Visualization of Neurites in Optical Microscopy Brain Images. IEEE Transactions on Visualization and Computer Graphics, 28, 4951-4965. [Google Scholar] [CrossRef
[81] Chong, J., Yamamoto, M. and Xia, J. (2019) MetaboAnalystR 2.0: From Raw Spectra to Biological Insights. Metabolites, 9, Article No. 57. [Google Scholar] [CrossRef] [PubMed]
[82] Finak, G., et al. (2015) MAST: A Flexible Statistical Framework for As-sessing Transcriptional Changes and Characterizing Heterogeneity in Single-Cell RNA Sequencing Data. Genome Biology, 16, Article No. 278. [Google Scholar] [CrossRef] [PubMed]
[83] Ashburner, M., et al. (2000) Gene Ontology: Tool for the Unification of Biology. Nature Genetics, 25, 25-29. [Google Scholar] [CrossRef] [PubMed]
[84] Kiselev, V.Y., et al. (2017) SC3: Consensus Clustering of Single-Cell RNA-Seq Data. Nature Methods, 14, 483-486. [Google Scholar] [CrossRef] [PubMed]