半监督学习下的特征筛选问题研究
Research on Feature Selection under Semi-Supervised Learning
DOI: 10.12677/AAM.2023.121040, PDF,   
作者: 叶雨薇:南京信息工程大学,江苏 南京
关键词: 特征筛选半监督学习分位数Feature Screening Semi-Supervised Learning Quantile
摘要: 相比于传统的监督学习算法,半监督学习下的特征筛选算法可以利用更多已知信息提高模型计算性能。本文利用样本特征的分位数推测总体特征的分布情况,基于无模型假设下给出相对稳健的半监督特征筛选结果,模拟发现该算法在标记样本量相对较少且各类样本量不均衡的情况下适用。实例借助TCGA中的肺腺癌(LUAD)和肺鳞癌(LUSC)数据集验证算法的有效性。
Abstract: Compared with the traditional supervised learning algorithm, the feature screening algorithm un-der semi-supervised learning can use more known information to improve the computational per-formance of the model. In this paper, the quantile of sample features is used to predict the distribu-tion of overall features, and a relatively robust semi-supervised feature screening result is given based on the model-free hypothesis. Simulation results show that this algorithm is applicable when the number of labeled samples is relatively small and the number of various samples is unbalanced. The effectiveness of the algorithm is verified by the data sets of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) in TCGA.
文章引用:叶雨薇. 半监督学习下的特征筛选问题研究[J]. 应用数学进展, 2023, 12(1): 367-372. https://doi.org/10.12677/AAM.2023.121040

参考文献

[1] Fan, J. and Lv, J. (2008) Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 849-911. [Google Scholar] [CrossRef] [PubMed]
[2] Li, R., Zhong, W. and Zhu, L. (2012) Feature Screening via Distance Correlation Learning. Journal of the American Statistical Association, 107, 1129-1139. [Google Scholar] [CrossRef] [PubMed]
[3] Cui, H., Li, R. and Zhong, W. (2015) Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis. Journal of the American Statistical Association, 110, 630-641. [Google Scholar] [CrossRef] [PubMed]
[4] Mai, Q. and Zou, H. (2013). The Kolmogorov Filter for Variable Screening in High-Dimensional Binary Classification. Biometrika, 100, 229-234.[CrossRef
[5] Mai, Q. and Zou, H. (2015) The Fused Kolmogorov Filter: A Nonpar-ametric Model-Free Screening Method. The Annalsof Statistics, 43, 1471-1497. [Google Scholar] [CrossRef
[6] He, X., Cai, D. and Niyogi, P. (2005) Laplacian Score for Feature Selec-tion. Advances in Neural Information Processing Systems, 18, 507-514.
[7] Zhao, J., Lu, K. and He, X. (2008) Locali-ty Sensitive Semi-Supervised Feature Selection. Neurocomputing, 71, 1842- 1849. [Google Scholar] [CrossRef
[8] Cheng, H., Deng, W., Fu, C., Wang, Y. and Qin, Z. (2011) Graph-Based Semi-Supervised Feature Selection with Application to Automatic Spam Image Identification. In: Yu, Y., Yu, Z. and Zhao, J., Eds., Computer Science for Environmental Engineering and EcoInformatics. CSEEE 2011. Com-munications in Computer and Information Science. Springer, Berlin, Heidelberg, 259-264. [Google Scholar] [CrossRef
[9] Sheikhpour, R., Sarram, M.A. and Sheikhpour, E. (2018) Semi-Supervised Sparse Feature Selection via Graph Laplacian Based Scatter Matrix for Regression Problems. Infor-mation Sciences, 468, 14-28. [Google Scholar] [CrossRef
[10] Song, F., Lai, P. and Shen, B. (2020). Robust Composite Weighted Quantile Screening for Ultrahigh Dimensional Discriminant Analysis. Metrika, 83, 799-820.[CrossRef