基于核密度互信息的无监督特征选择方法
Unsupervised Feature Selection Method Based on Kernel Density Mutual Information
摘要: 特征选择是对高维数据进行分析的基础,是许多模式识别应用程序中的一个关键过程。由于使用大量的特征,学习模型会出现过拟合,导致模型性能下降等,因此对高维数据进行特征筛选变得尤为重要。本文提出基于核密度互信息的无监督特征选择方法,算法引入互信息作为特征之间的相关性度量,可以很好地度量特征间的线性和非线性关系,同时采用核密度估计的方式估计特征的密度函数,可以对互信息更准确地计算。最后将该方法应用在两个真实的高维数据集上,并将筛选后的数据应用于三个常用的分类模型,采用分类精度和F1值对分类模型进行评价。通过与现有的基于互相关性方法作对比分析得知,在两个数据集上,两种筛选方法在三个分类模型中的分类精度表现类似,而基于核密度互信息的方法在三种分类算法上的F1值比基于互相关性的方法表现更好,表明本文所提方法具有更好的筛选性能。
Abstract: Feature selection is fundamental to the analysis of high-dimensional data and is a key process in many pattern recognition applications. Due to the use of a large number of features, the learning model will overfit, resulting in the degradation of model performance, etc., so the feature screening of high-dimensional data becomes particularly important. This paper proposes an unsupervised feature selection method based on kernel density mutual information. Mutual information is introduced as the correlation measure between features, which can well measure the linear and nonlinear relationship between features. Finally, the method is applied to two real high-dimensional data sets, and the filtered data is applied to three commonly used classification models, and the classification model is evaluated by classification accuracy and F1 value. Through comparative analysis with the existing cross-correlation-based method, it is known that on the two data sets, the classification accuracy performance of the two screening methods in the three classification models is similar, and the F1 value of the method based on kernel density mutual information is better than the method based on cross-correlation on the three classification algorithms, indicating that the proposed method has better screening performance.
文章引用:袁松. 基于核密度互信息的无监督特征选择方法[J]. 运筹与模糊学, 2023, 13(6): 7377-7385. https://doi.org/10.12677/ORF.2023.136725

参考文献

[1] Zhou, H., Wang, X. and Zhu, R. (2022) Feature Selection Based on Mutual Information with Correlation Coefficient. Applied Intelligence, 52, 5457-5474. [Google Scholar] [CrossRef
[2] Solorio-Fernández, S., Carrasco-Ochoa, J.A., and Martínez-Trinidad, J.F. (2020) A Review of Unsupervised Feature Selection Methods. Artificial Intelligence Review, 53, 907-948. [Google Scholar] [CrossRef
[3] Hancer, E., Xue, B. and Zhang, M. (2020) A Survey on Feature Selection Approaches for Clustering. Artificial Intelligence Review, 53, 4519-4545. [Google Scholar] [CrossRef
[4] Hancer, E. (2020) A New Multi-Objective Differential Evolution Approach for Simultaneous Clustering and Feature Selection. Engineering Applications of Artificial Intelligence, 87, Article ID: 103307. [Google Scholar] [CrossRef
[5] Zhu, Q.H. and Yang, Y.B. (2018) Discriminative Em-bedded Unsupervised Feature Selection. Pattern Recognition Letters, 112, 219-225. [Google Scholar] [CrossRef
[6] Solorio-Fernández, S., Martínez-Trinidad, J.F. and Carras-co-Ochoa, J.A. (2017) A New Unsupervised Spectral Feature Selection Method for Mixed Data: A Filter Approach. Pattern Recognition, 72, 314-326. [Google Scholar] [CrossRef
[7] Haindl, M., Somol, P., Ververidis, D., et al. (2006) Feature Selection Based on Mutual Correlation. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A. and Kittler, J., Eds., Pro-gress in Pattern Recognition, Image Analysis and Applications, CIARP 2006, Springer, Berlin, Heidelberg, 569-577. [Google Scholar] [CrossRef
[8] Zhou, S., Wang, T. and Huang, Y. (2022) Feature Screening via Mutual Information Learning Based on Nonparametric Density Estimation. Journal of Mathematics, 2022, Article ID: 7584374. [Google Scholar] [CrossRef
[9] Golub, T.R., Slonim, D.K., Tamayo, P., et al. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286, 531-537. [Google Scholar] [CrossRef] [PubMed]
[10] Fan, J. and Lv, J. (2008) Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 849-911. [Google Scholar] [CrossRef] [PubMed]
[11] Land, W.H., Schaffer, J.D., Land, W.H., et al. (2020) The Support Vector Machine. In: Land, W.H. and Schaffer, J.D., Eds., The Art and Science of Machine Intelligence: With an Innovative Application for Alzheimer’s Detection from Speech, Springer, Cham, 45-76. [Google Scholar] [CrossRef
[12] Breiman, L. (2021) Random Forests. Ma-chine Learning, 45, 5-32. [Google Scholar] [CrossRef
[13] Peterson, L.E. (2009) K-Nearest Neighbor. Scholarpedia, 4, Article No. 1883. [Google Scholar] [CrossRef