基于RFE引导PSO启发式特征选择的高维结直肠癌癌前病变分类
Classification of High Dimensional Colorectal Precancerous Lesion Based on RFE-Guided PSO Inspired Feature Selection
摘要: 针对结肠镜数据中存在的高维、小样本以及特征冗余严重等问题,构建一种RFE引导的PSO启发式特征选择与分类框架。首先,用F检验与相关性分析对原始特征进行初筛,以降低维度并减少冗余信息;引入RFE生成特征重要性评分引导搜索过程,并采用一种改进的PSO启发式优化策略实现对特征子集的全局搜索,结合局部搜索机制对候选解进一步细化,同时在适应度函数中引入特征规模约束与稳定性约束,增强搜索能力并提高最优解的鲁棒性。最后,采用RBF核的支持向量机作为分类器,在五折交叉验证下评估其分类性能。实验结果表明,其Recall、Accuracy、F1-score及G-mean等多种评价指标均优于对比方法。
Abstract: To address the challenges of high dimensionality, small sample sizes, and severe feature redundancy in colonoscopy data, we propose an RFE-guided PSO heuristic framework for feature selection and classification. Firstly, the raw features are pre-screened using F-tests and correlation analysis to reduce dimensionality and eliminate redundant information. RFE is introduced to generate feature importance scores that guide the search process, whilst an improved PSO heuristic optimisation strategy is employed to perform a global search for feature subsets. This is combined with a local search mechanism to further refine candidate solutions. Additionally, feature scale and stability constraints are incorporated into the fitness function to enhance search capabilities and improve the robustness of the optimal solution. Finally, a radial basis function (RBF) kernel support vector machine is employed as the classifier, and its classification performance is evaluated under five-fold cross-validation. Experimental results demonstrate that various evaluation metrics, including Recall, Accuracy, F1-score and G-mean, outperform those of the comparison methods.
文章引用:张雅洁, 汪颖. 基于RFE引导PSO启发式特征选择的高维结直肠癌癌前病变分类[J]. 应用数学进展, 2026, 15(5): 659-672. https://doi.org/10.12677/aam.2026.155258

参考文献

[1] 景凯. 基于生物信息学分析技术筛选结直肠癌相关基因及其功能[D]: [硕士学位论文]. 济南: 山东大学, 2024.
[2] 孙丽芹. 基于智能优化的高维数据特征选择算法研究[D]: [博士学位论文]. 西安: 西安电子科技大学, 2023.
[3] 茅婷, 张月义, 孙叶芳, 虞岚婷. 基于MMTS-AdaBoost的高维结直肠癌癌前病变分类[J]. 计算机应用与软件, 2024, 41(1): 291-296.
[4] Deng, F., Zhao, L., Yu, N., Lin, Y. and Zhang, L. (2024) Union with Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer. Laboratory Investigation, 104, Article ID: 100320. [Google Scholar] [CrossRef] [PubMed]
[5] Ali A. Mohamed, A., Rahebi, M., Hançerlioğulları, A. and Rahebi, J. (2025) An Approach Based on Convolutional Neural Network and ACO-PSO for Colon Cancer Disease Diagnosis. Politeknik Dergisi, 28, 649-659. [Google Scholar] [CrossRef
[6] 廖南清, 张祁新. 结直肠癌病理类型的多模态融合分类模型[J]. 生物医学, 2025, 15(5): 1012-1023.
[7] 曹君杰, 冯爱芬, 常芳欣, 杨双杨, 蒋智涵, 王世杰. 网格搜索的支持向量机方法在乳腺癌诊断中的应用[J]. 应用数学进展, 2025, 14(5): 238-243.
[8] Rayarao, S.R. (205) F-Tests: A Comprehensive Review of Theory, Applications, and Statistical Inference. Authorea.
[9] 徐维超. 相关系数研究综述[J]. 广东工业大学学报, 2012, 29(3): 12-17.
[10] 林小棋, 任超, 李毅, 等. 基于Relief F-RFE特征优选的桉树人工林提取[J]. 测绘科学, 2023, 48(10): 107-115.
[11] Gad, A.G. (2022) Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review. Archives of Computational Methods in Engineering, 29, 2531-2561. [Google Scholar] [CrossRef
[12] 陈垂丽. 基于多目标进化优化的特征选择理论与方法[D]: [硕士学位论文]. 北京: 中国矿业大学, 2025.
[13] Du, K., Jiang, B., Lu, J., Hua, J. and Swamy, M.N.S. (2024) Exploring Kernel Machines and Support Vector Machines: Principles, Techniques, and Future Directions. Mathematics, 12, Article 3935. [Google Scholar] [CrossRef
[14] 高岳林, 杨钦文, 王晓峰, 等. 新型群体智能优化算法综述[J]. 郑州大学学报(工学版), 2022, 43(3): 21-30.
[15] 冯茜, 李擎, 全威, 裴轩墨. 多目标粒子群优化算法研究综述[J]. 工程科学学报, 2021, 43(6): 745-753.
[16] Xie, S., Zhang, Y., Lv, D., Chen, X., Lu, J. and Liu, J. (2022) A New Improved Maximal Relevance and Minimal Redundancy Method Based on Feature Subset. The Journal of Supercomputing, 79, 3157-3180. [Google Scholar] [CrossRef] [PubMed]
[17] 孙会岳. 基于粒子群优化的高维特征选择方法研究[D]: [硕士学位论文]. 大连: 大连理工大学, 2025.
[18] 彭建新, 詹志辉. 全局信息引导的改进粒子群优化算法[J]. 小型微型计算机系统, 2016, 37(7): 1518-1521.
[19] 肖胤喆. 基于改进粒子群优化算法的特征选择方法研究[D]: [硕士学位论文]. 长春: 吉林大学, 2022.
[20] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[21] Takahashi, K., Yamamoto, K., Kuchiba, A. and Koyama, T. (2022) Confidence Interval for Micro-Averaged F1 and Macro-Averaged F1 Scores. Applied Intelligence, 52, 4961-4972. [Google Scholar] [CrossRef] [PubMed]
[22] de la Cruz Huayanay, A., Bazán, J.L. and Russo, C.M. (2024) Performance of Evaluation Metrics for Classification in Imbalanced Data. Computational Statistics, 40, 1447-1473. [Google Scholar] [CrossRef