SCINNO:一种基于深度学习的自编码器聚类分析方法
SCINNO: A Deep Learning-Based Autoencoder Clustering Analysis Method
摘要: 单细胞测序技术的快速发展为解析细胞异质性提供了前所未有的分辨率,但高噪声数据与复杂细胞亚群的精准聚类仍是重大挑战。文章提出了一种融合深度去噪网络与多头自注意力机制的深度聚类框架,旨在提升单细胞数据的特征表示能力与聚类鲁棒性。首先,设计基于深度去噪自编码器的深度去噪网络(DN),通过引入InfoNCE对比损失函数增强特征解耦能力,有效抑制数据噪声并提取低维干净特征;随后,提出一种结合多头自注意力机制的深度聚类网络(CN),利用注意力权重捕捉特征间全局关联性,并通过模糊K-means算法动态优化隶属度矩阵U与聚类中心。在多个公开单细胞数据集上的实验表明,本方法较其他聚类算法具有更好的聚类效果,为单细胞数据的高效解析提供了新的理论支持与技术工具。
Abstract: The rapid development of single-cell sequencing technology has provided unprecedented resolution for resolving cellular heterogeneity, but accurate clustering of noisy data and complex cell subpopulations remains a major challenge. This paper proposes a deep clustering framework that integrates a deep denoising network and a multi-head self-attention mechanism, aiming to improve the feature representation ability and clustering robustness of single-cell data. Firstly, a deep denoising network (DN) based on a deep denoising autoencoder was designed, and the feature decoupling ability was enhanced by introducing the InfoNCE contrast loss function, which effectively suppressed the data noise and extracted low-dimensional clean features. Subsequently, a deep clustering network (CN) combined with a multi-head self-attention mechanism was proposed, which used attention weights to capture the global correlation between features and dynamically optimized the membership matrix U and the clustering center through the fuzzy K-means algorithm. Experiments on multiple public single-cell datasets show that the proposed method has a better clustering effect than other clustering algorithms and provides new theoretical support and technical tools for the efficient analysis of single-cell data.
文章引用:齐紫瑶. SCINNO:一种基于深度学习的自编码器聚类分析方法[J]. 建模与仿真, 2025, 14(5): 818-828. https://doi.org/10.12677/mos.2025.145436

参考文献

[1] AlJanahi, A.A., Danielsen, M. and Dunbar, C.E. (2018) An Introduction to the Analysis of Single-Cell RNA-Sequencing Data. Molecular TherapyMethods & Clinical Development, 10, 189-196. [Google Scholar] [CrossRef] [PubMed]
[2] Likas, A., Vlassis, N. and J. Verbeek, J. (2003) The Global K-Means Clustering Algorithm. Pattern Recognition, 36, 451-461. [Google Scholar] [CrossRef
[3] Lawlor, N., George, J., Bolisetty, M., Kursawe, R., Sun, L., Sivakamasundari, V., et al. (2016) Single-Cell Transcriptomes Identify Human Islet Cell Signatures and Reveal Cell-Type-Specific Expression Changes in Type 2 Diabetes. Genome Research, 27, 208-222. [Google Scholar] [CrossRef] [PubMed]
[4] Muraro, M.J., Dharmadhikari, G., Grün, D., Groen, N., Dielen, T., Jansen, E., et al. (2016) A Single-Cell Transcriptome Atlas of the Human Pancreas. Cell Systems, 3, 385-394. [Google Scholar] [CrossRef] [PubMed]
[5] Bhattacherjee, A., Djekidel, M.N., Chen, R., Chen, W., Tuesta, L.M. and Zhang, Y. (2019) Cell Type-Specific Transcriptional Programs in Mouse Prefrontal Cortex during Adolescence and Addiction. Nature Communications, 10, Article No. 4169. [Google Scholar] [CrossRef] [PubMed]
[6] Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W.M., et al. (2019) Comprehensive Integration of Single-Cell Data. Cell, 177, 1888-1902. [Google Scholar] [CrossRef] [PubMed]
[7] Wolf, F.A., Angerer, P. and Theis, F.J. (2018) SCANPY: Large-Scale Single-Cell Gene Expression Data Analysis. Genome Biology, 19, Article No. 15. [Google Scholar] [CrossRef] [PubMed]
[8] Peng, L., Tian, X., Tian, G., Xu, J., Huang, X., Weng, Y., et al. (2020) Single-Cell RNA-Seq Clustering: Datasets, Models, and Algorithms. RNA Biology, 17, 765-783. [Google Scholar] [CrossRef] [PubMed]
[9] Petegrosso, R., Li, Z. and Kuang, R. (2019) Machine Learning and Statistical Methods for Clustering Single-Cell RNA-Sequencing Data. Briefings in Bioinformatics, 21, 1209-1223. [Google Scholar] [CrossRef] [PubMed]
[10] duVerle, D.A., Yotsukura, S., Nomura, S., Aburatani, H. and Tsuda, K. (2016) Celltree: An R/Bioconductor Package to Infer the Hierarchical Structure of Cell Populations from Single-Cell RNA-Seq Data. BMC Bioinformatics, 17, Article No. 363. [Google Scholar] [CrossRef] [PubMed]
[11] Wan, S., Kim, J. and Won, K.J. (2020) SHARP: Hyperfast and Accurate Processing of Single-Cell RNA-Seq Data via Ensemble Random Projection. Genome Research, 30, 205-213. [Google Scholar] [CrossRef] [PubMed]
[12] Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. and Batzoglou, S. (2017) Visualization and Analysis of Single-Cell RNA-Seq Data by Kernel-Based Similarity Learning. Nature Methods, 14, 414-416. [Google Scholar] [CrossRef] [PubMed]
[13] Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. and Regev, A. (2015) Spatial Reconstruction of Single-Cell Gene Expression Data. Nature Biotechnology, 33, 495-502. [Google Scholar] [CrossRef] [PubMed]
[14] Cui, Y., Zhang, S., Liang, Y., Wang, X., Ferraro, T.N. and Chen, Y. (2021) Consensus Clustering of Single-Cell RNA-Seq Data by Enhancing Network Affinity. Briefings in Bioinformatics, 22, bbab236. [Google Scholar] [CrossRef] [PubMed]
[15] Luo, Z., Xu, C., Zhang, Z. and Jin, W. (2021) A Topology-Preserving Dimensionality Reduction Method for Single-Cell RNA-Seq Data Using Graph Autoencoder. Scientific Reports, 11, Article No. 20028. [Google Scholar] [CrossRef] [PubMed]