基于PiE正则化与自步学习的稀疏逻辑回归特征选择方法
Sparse Logistic Regression Feature Selection Method Based on PiE Regularization and Self-Paced Learning
摘要: 针对高维小样本、高噪声基因表达数据的特征选择与分类问题,本文提出一种融合分段指数(PiE)正则化与自步学习(SPL)机制的稀疏逻辑回归模型(PiE-SPLR)。PiE正则化能逼近 l 0 范数,具有强稀疏选择能力;SPL机制可逐步筛选低噪声样本,增强模型鲁棒性。通过交替方向优化与近端梯度法高效求解,模型在Colon、Leukemia等四个基因数据集上取得了最优分类性能,同时所选特征数最少。该方法为基因标志物挖掘与高维数据分类提供了有效工具。
Abstract: To address the feature selection and classification challenges in high-dimensional, small-sample, and high-noise gene expression data, this paper proposes a sparse logistic regression model (PIE-SPLR) that integrates Piecewise Exponential (PiE) regularization with Self-Paced Learning (SPL). The PiE regularization approximates the l 0 norm, providing strong sparse selection capability, while the SPL mechanism progressively filters low-noise samples to enhance model robustness. Through efficient alternating direction optimization and proximal gradient methods, the model achieves optimal classification performance on four gene datasets (Colon, Leukemia, etc.), while selecting the fewest features. This method provides an effective tool for biomarker discovery and high-dimensional data classification.
文章引用:马娟玲. 基于PiE正则化与自步学习的稀疏逻辑回归特征选择方法[J]. 应用数学进展, 2026, 15(2): 247-256. https://doi.org/10.12677/aam.2026.152066

参考文献

[1] Song, X., Liu, M.T., Liu, Q. and Niu, B. (2021) Hydrological Cycling Optimization‐Based Multiobjective Feature‐Selection Method for Customer Segmentation. International Journal of Intelligent Systems, 36, 2347-2366. [Google Scholar] [CrossRef
[2] Li, C.‐N., Shao, Y.‐H., Zhao, D., Guo, Y.‐R. and Hua, X.‐Y. (2020) Feature Selection for High‐Dimensional Regression via Sparse LSSVR Based on LP‐Norm. International Journal of Intelligent Systems, 36, 1108-1130.
[3] Frankell, A.M., Jammula, S., Contino, G., Killcoyne, S.S. and Fitzgerald, R.C. (2018) The Landscape of Selection in 551 Esophageal Adenocarcinomas Defines Genomic Biomarkers for the Clinic. Nature Genetics.
[4] Huang, H.H., Peng, X.D. and Liang, Y. (2021) Splsn: An Efficient Tool for Survival Analysis and Biomarker Selection. International Journal of Intelligent Systems, 36, 5845-5865.
[5] Fei, T. and Yu, T. (2020) Scbatch: Batch-Effect Correction of RNA-Seq Data through Sample Distance Matrix Adjustment. Bioinformatics, 36, 3115-3123. [Google Scholar] [CrossRef] [PubMed]
[6] Armanfard, N., Reilly, J.P. and Komeili, M. (2016) Local Feature Selection for Data Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 1217-1227. [Google Scholar] [CrossRef] [PubMed]
[7] Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. [Google Scholar] [CrossRef
[8] Zhang, X., Wu, Y., Wang, L. and Li, R. (2015) Variable Selection for Support Vector Machines in Moderately High Dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78, 53-76. [Google Scholar] [CrossRef] [PubMed]
[9] Wang, L., Zhu, J. and Zou, H. (2006) The Doubly Regularized Support Vector Machine. Statistica Sinica, 589-615.
[10] Meinshausen, N. and Yu, B. (2009) Lasso-Type Recovery of Sparse Representations for High-Dimensional Data. The Annals of Statistics, 37, 246-270. [Google Scholar] [CrossRef
[11] Xu, Z., Zhang, H., Wang, Y., Chang, X. and Liang, Y. (2010) l1/2 Regularization. Science China: Information Sciences, 53, 1159-1169.
[12] Zeng, J., Lin, S., Wang, Y., Xu, and Zongben. (2014) l1/2 Regularization: Convergence of Iterative Half Thresholding Algorithm. IEEE Transactions on Signal Processing, 62, 2317-2329.
[13] Xu, F., Duan, J. and Liu, W. (2023) Comparative Study of Non-Convex Penalties and Related Algorithms in Com-Pressed Sensing. Digital Signal Processing, 135, 103937.
[14] Kumar, M.P., Packer, B. and Koller, D. (2010) Self-Paced Learning for Latent Variable Models. Curran Associates Inc.
[15] Li, C., Wei, F., Yan, J., Zhang, X., Liu, Q. and Zha, H. (2018) A Self-Paced Regularization Framework for Multilabel Learning. IEEE Transactions on Neural Networks and Learning Systems, 29, 2660-2666. [Google Scholar] [CrossRef] [PubMed]
[16] Huang, H. and Liang, Y. (2019) An Integrative Analysis System of Gene Expression Using Self-Paced Learning and Scad-Net. Expert Systems with Applications, 135, 102-112. [Google Scholar] [CrossRef
[17] Liu, Y., Zhou, Y. and Lin, R. (2024) The Proximal Operator of the Piece-Wise Exponential Function. IEEE Signal Processing Letters, 31, 894-898. [Google Scholar] [CrossRef
[18] Mező, I. (2022) The Lambert W Function: Its Generalizations and Applications. Chapman and Hall/CRC. [Google Scholar] [CrossRef
[19] Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., et al. (1999) Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proceedings of the National Academy of Sciences, 96, 6745-6750. [Google Scholar] [CrossRef] [PubMed]
[20] Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C. and Lander, E.S. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286, 531-537.
[21] Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., et al. (2002) Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning. Nature Medicine, 8, 68-74. [Google Scholar] [CrossRef] [PubMed]
[22] Yang, K., Cai, Z., Li, J. and Lin, G. (2006) A Stable Gene Selection in Microarray Data Analysis. BMC Bioinformatics, 7, Article No. 228. [Google Scholar] [CrossRef] [PubMed]
[23] Huang, H., Wu, N., Liang, Y., Peng, X. and Shu, J. (2022) SLNL: A Novel Method for Gene Selection and Phenotype Classification. International Journal of Intelligent Systems, 37, 6283-6304. [Google Scholar] [CrossRef