基于稳定相关系的超高维筛选研究
Feature Screening for Ultra-High Dimensional Data Based on Stable Correlation Coefficient
摘要: 特征筛选是超高维数据分析中非常重要的一环,筛选降维过程的准确性将影响到后续的建模分析。针对稳定特征筛选方法(SC-SIS)的不足之处进行改进,基于稳定相关系数提出了适用于超高维无模型假设下稳健特征筛选方法(RSCS),相比SC-SIS,该方法对数据中存在异常点或协变量服从重尾分布更有稳健性,从理论上证明了RSCS方法具有确定性筛选性质,并通过蒙特卡洛数值模拟和小鼠基因组数据验证了RSCS方法的有限样本性质。
Abstract: Feature screening is an important part of ultra-high-dimensional data analysis. The accuracy of the screening and dimensionality reduction process will affect the subsequent modeling analysis. Aiming at the shortcomings of the stable feature screening method (SC-SIS), based on the stable correlation coefficient, a robust feature screening method (RSCS) suitable for ultra-high-dimensional model-free assumptions is proposed. This paper proves theoretically that the proposed feature screening method satisfies the sure screening property. Numerical simulation and a real data application under the finite sample are conducted to evaluate the performance of the proposed method.
文章引用:闫习. 基于稳定相关系的超高维筛选研究[J]. 应用数学进展, 2021, 10(11): 3777-3782. https://doi.org/10.12677/AAM.2021.1011400

参考文献

[1] Fan, J. and Lv, J. (2008) Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society, 70, 849-911. [Google Scholar] [CrossRef] [PubMed]
[2] Fan, J. and Song, R. (2010) Sure Independence Screening in Generalized Linear Models with NP-Dimensionality. The Annals of Statistics, 38, 3567-3604. [Google Scholar] [CrossRef
[3] Fan, J., Samworth, R. and Wu, Y. (2009) Ultrahigh Dimensional Feature Selection: Beyond the Linear Mode. The Journal of Machine Learning Research, 10, 2013-2038.
[4] Li, G., Peng, H., Zhang, J., et al. (2012) Robust Rank Correlation Based Screening. The Annals of Statistics, 40, 1846-1877. [Google Scholar] [CrossRef
[5] Zhu, L., Li, L., Li, R., et al. (2011) Model-Free Feature Screening for Ultrahigh Dimensional Data. Journal of the American Statistical Association, 106, 1464-1475. [Google Scholar] [CrossRef] [PubMed]
[6] Li, R., Zhong, W. and Zhu, L. (2012) Feature Screening via Distance Correlation Learning. Journal of the American Statistical Association, 107, 1129-1139. [Google Scholar] [CrossRef] [PubMed]
[7] Shao, X. and Zhang, J. (2014) Martingale Difference Correlation and Its Use in High Dimensional Variable Screening. Journal of the American Statistical Association, 109, 1302-1318. [Google Scholar] [CrossRef
[8] Guo, X., Li, R., Liu, W., et al. (2021) Stable Correlation and Robust Feature Screening. Science China Mathematics, 1-16. [Google Scholar] [CrossRef
[9] Redfern, C., Coward, P., Degtyarev, M., et al. (1999) Conditional Expression and Signaling of a Specifically Designed GI-Coupled Receptor in Transgenic Mice. Nature Biotechnology, 17, 165-169. [Google Scholar] [CrossRef] [PubMed]