面向结直肠癌的几何感知结构稀疏特征选择(AW-MSGL):以更少可解释基因实现 稳定分类
Geometry-Aware Structured Sparse Feature Selection for Colorectal Cancer (AW-MSGL): Stable Classification with Fewer Interpretable Genes
摘要: 结直肠癌(CRC)的早筛与分层诊断面临“样本少、维度高、冗余强”的基因表达数据挑战。本文在不改变流形稀疏组LASSO (MSGL)核心思想的前提下,提出自适应组权重的MSGL (AW-MSGL),以更少且可解释的基因子集实现稳定分类。方法引入数据驱动的组权重 w j 抑制共表达冗余;预处理采用F-score + KMeans自动构建模块;优化沿用加速近端梯度(APG)。在CRC微阵列数据(含独立测试集)上,AW-MSGL以显著更少的基因达到可比或更优的准确率,并在关键基因的生物学解释上保持一致性。该框架有望为CRC的轻量部署与可解释生物标志物发现提供数据驱动的工具。
Abstract: Early screening and stratified diagnosis of colorectal cancer (CRC) face challenges from gene expression data characterized by “small sample size, high dimensionality, and strong redundancy”. Without altering the core philosophy of Manifold Sparse Group LASSO (MSGL), this paper proposes Adaptive Weighted MSGL (AW-MSGL) to achieve stable classification with a smaller, interpretable subset of genes. The method introduces data-driven group weights ( w j ) to suppress co-expression redundancy; preprocessing employs F-score combined with KMeans to automatically construct modules; and optimization utilizes Accelerated Proximal Gradient (APG). On CRC microarray datasets (including an independent test set), AW-MSGL achieves comparable or superior accuracy with significantly fewer genes while maintaining consistency in the biological interpretation of key genes. This framework offers a data-driven tool for lightweight deployment and the discovery of interpretable biomarkers in CRC.
文章引用:韩君亚. 面向结直肠癌的几何感知结构稀疏特征选择(AW-MSGL):以更少可解释基因实现 稳定分类[J]. 临床医学进展, 2026, 16(3): 3593-3607. https://doi.org/10.12677/acm.2026.1631167

参考文献

[1] Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. [Google Scholar] [CrossRef
[2] Bühlmann, P. and Van De Geer, S. (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Science Business Media.
[3] Li, Q. (2023) A Comprehensive Survey of Sparse Regularization: Fundamental, State-of-the-Art Methodologies and Applications on Fault Diagnosis. Expert Systems with Applications, 229, Article ID: 120517. [Google Scholar] [CrossRef
[4] Frank, L.E. and Friedman, J.H. (1993) A Statistical View of Some Chemometrics Regression Tools. Technometrics, 35, 109-135. [Google Scholar] [CrossRef
[5] Meinshausen, N. and Bühlmann, P. (2006) High-Dimensional Graphs and Variable Selection with the Lasso. The Annals of Statistics, 34, 1436-1462. [Google Scholar] [CrossRef
[6] Xu, J. and Ying, Z. (2008) Simultaneous Estimation and Variable Selection in Median Regression Using Lasso-Type Penalty. Annals of the Institute of Statistical Mathematics, 62, 487-514. [Google Scholar] [CrossRef] [PubMed]
[7] Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. [Google Scholar] [CrossRef
[8] Bühlmann, P., Meier, L. and Zou, H. (2008) Discussion of “One-Step Sparse Estimates in Nonconcave Penalized Likelihood Models” by H. Zou and R. Li. The Annals of Statistics, 36, 1534-1541.
[9] Zou, H. (2006) The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101, 1418-1429. [Google Scholar] [CrossRef
[10] Lin, Z., Xiang, Y. and Zhang, C. (2009) Adaptive Lasso in High-Dimensional Settings. Journal of Nonparametric Statistics, 21, 683-696. [Google Scholar] [CrossRef
[11] Yuan, M. and Lin, Y. (2007) Model Selection and Estimation in the Gaussian Graphical Model. Biometrika, 94, 19-35. [Google Scholar] [CrossRef
[12] Zhang, C. (2010) Nearly Unbiased Variable Selection under Minimax Concave Penalty. The Annals of Statistics, 38, 894-942. [Google Scholar] [CrossRef
[13] Breheny, P. and Huang, J. (2011) Coordinate Descent Algorithms for Nonconvex Penalized Regression, with Applications to Biological Feature Selection. The Annals of Applied Statistics, 5, Article No. 232. [Google Scholar] [CrossRef] [PubMed]
[14] Tian, G.L., Tang, M.L., Fang, H.B., et al. (2008) Efficient Methods for Estimating Constrained Parameters with Applications to Regularized (Lasso) Logistic Regression. Computational Statistics & Data Analysis, 52, 3528-3542. [Google Scholar] [CrossRef] [PubMed]
[15] Adeli, E., Li, X., Kwon, D., Zhang, Y. and Pohl, K.M. (2020) Logistic Regression Confined by Cardinality-Constrained Sample and Feature Selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 1713-1728. [Google Scholar] [CrossRef] [PubMed]
[16] Liang, Y., Liu, C., Luan, X., Leung, K., Chan, T., Xu, Z., et al. (2013) Sparse Logistic Regression with a L1/2 Penalty for Gene Selection in Cancer Classification. BMC Bioinformatics, 14, Article No. 198. [Google Scholar] [CrossRef] [PubMed]
[17] Xu, Z., Zhang, H., Wang, Y., Chang, X. and Liang, Y. (2010) L 1/2 Regularization. Science China Information Sciences, 53, 1159-1169. [Google Scholar] [CrossRef
[18] Zou, H. and Hastie, T. (2005) Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67, 301-320. [Google Scholar] [CrossRef
[19] Zou, H. and Zhang, H.H. (2009) On the Adaptive Elastic-Net with a Diverging Number of Parameters. The Annals of Statistics, 37, Article No. 1733. [Google Scholar] [CrossRef] [PubMed]
[20] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2004) Sparsity and Smoothness via the Fused Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67, 91-108. [Google Scholar] [CrossRef
[21] Yuan, M. and Lin, Y. (2005) Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68, 49-67. [Google Scholar] [CrossRef
[22] Friedman, J., Hastie, T. and Tibshirani, R. (2010) A Note on the Group Lasso and a Sparse Group Lasso.
[23] Ma, Z., Guan, X., Liu, Y. and Shao, W. (2024) Identification of Essential Plasma Protein Using Manifold Regularized Sparse Group-Lasso for Prediction of Alzheimer’s Disease. Displays, 81, Article ID: 102578. [Google Scholar] [CrossRef
[24] Chen, X., Pan, W., Kwok, J.T. and Carbonell, J.G. (2009) Accelerated Gradient Method for Multi-Task Sparse Learning Problem. 2009 9th IEEE International Conference on Data Mining, Miami Beach, 6-9 December 2009, 746-751. [Google Scholar] [CrossRef
[25] Yang, G.-Z., Hu, L., Cai, J., et al. (2022) Prognostic Value of Carbonic Anhydrase VII Expression in Colorectal Carcinoma. Frontiers in Immunology, 13, Article ID: 1051353.
[26] Svastová, E., Hulíková, A., Rafajová, M., et al. (2004) Carbonic Anhydrase XII Is a Membrane-Bound Hypoxia-Inducible Protein beyond Carbonic Anhydrase IX. Journal of Biological Chemistry, 279, 23433-23441.
[27] Kondo, H., Yamada, D., Fujii, S., et al. (2018) Reduced Expression of Carbonic Anhydrase VII in Gastric Cancer: Its Association with Differentiation and Prognosis. Histopathology, 72, 987-997.
[28] Parenti, S., Montorsi, L., Fantini, S., Mammoli, F., Gemelli, C., Atene, C.G., et al. (2018) KLF4 Mediates the Effect of 5-ASA on the Β-Catenin Pathway in Colon Cancer Cells. Cancer Prevention Research, 11, 503-510. [Google Scholar] [CrossRef] [PubMed]
[29] Zheng, Y., Wu, J., Chen, H., Lin, D., Chen, H., Zheng, J., et al. (2023) KLF4 Targets RAB26 and Decreases 5-FU Resistance through Inhibiting Autophagy in Colon Cancer. Cancer Biology & Therapy, 24, Article ID: 2205253. [Google Scholar] [CrossRef] [PubMed]
[30] Zhang, J., Wang, T. and Niu, X. (2016) Increased Plasma Levels of Pentraxin 3 Are Associated with Poor Prognosis of Colorectal Carcinoma Patients. The Tohoku Journal of Experimental Medicine, 240, 39-46. [Google Scholar] [CrossRef] [PubMed]
[31] Chen, F.W., Wu, Y.L., Cheng, C.C., Hsiao, Y., Chi, J., Hung, L., et al. (2024) Inactivation of Pentraxin 3 Suppresses M2-Like Macrophage Activity and Immunosuppression in Colon Cancer. Journal of Biomedical Science, 31, Article No. 10. [Google Scholar] [CrossRef] [PubMed]
[32] Li, M., Hu, Y., Wang, J., Xu, Y., Hong, Y., Zhang, L., et al. (2023) The Dual HDAC and PI3K Inhibitor, CUDC-907, Inhibits Tumor Growth and Stem-Like Properties by Suppressing PTX3 in Neuroblastoma. International Journal of Oncology, 64, Article No. 14. [Google Scholar] [CrossRef] [PubMed]