基于三元互信息的成对多标签特征选择算法研究
Pairwise Multi-Label Feature Selection Method Based on Interaction Mutual Information
DOI: 10.12677/csa.2024.1410198, PDF,    国家自然科学基金支持
作者: 张 平:河北工业大学省部共建电工装备可靠性与智能化国家重点实验室,天津;河北工业大学人工智能与数据科学学院,天津;河北省大数据计算重点实验室,天津;王光磊, 张亚娟*, 曹 宇:河北工业大学人工智能与数据科学学院,天津;河北省大数据计算重点实验室,天津
关键词: 机器学习特征选择三元互信息分类Machine Learning Feature Selection Interaction Mutual Information Classification
摘要: 基于信息论的特征选择算法在度量候选特征所提供的分类信息时,往往仅考虑单一标签的情况,忽略了候选特征和成对标签存在的多样关联关系,这可能导致低估了候选特征的重要性。为解决这一问题,提出一种新颖的基于三元互信息的成对多标签特征选择算法(Pairwise multi-label feature selection based on interaction mutual information, IPFS)。具体地,IPFS算法为不同的成对标签分配基于三元互信息的不同权重,并据此权重测量候选特征为两个标签提供的分类信息总量,从而精确评估候选特征的重要性,同时基于最大相关最小冗余原则,筛选出最优的特征子集。最后,将提出的算法与其他8个先进的特征选择算法在12个多样化的数据集上进行了比较。实验结果表明,IPFS在3个评估指标上均显著优于其他算法。
Abstract: The feature selection methods based on information theory usually focus on considering the single label when evaluating the classification information provided by the candidate features, and do not take into account the multiple correlations between the candidate features and the paired labels, thus underestimating the importance of the candidate features. To solve this issue, an innovative paired multi-label feature selection method based on interaction mutual information (IPFS) was proposed. Specifically, IPFS method assigns different weights based on interaction mutual information to different pairs of labels, so as to accurately evaluate the importance of candidate features, and further select the most suitable feature subset based on the maximum correlation minimum redundancy strategy. To verify the effectiveness of the proposed method, IPFS is compared with eight other advanced feature selection methods on 12 diverse datasets, and the results show that IPFS significantly outperforms other methods on four different evaluation metrics.
文章引用:张平, 王光磊, 张亚娟, 曹宇. 基于三元互信息的成对多标签特征选择算法研究[J]. 计算机科学与应用, 2024, 14(10): 10-21. https://doi.org/10.12677/csa.2024.1410198

参考文献

[1] Papaspiliopoulos, O. (2020) High-Dimensional Probability: An Introduction with Applications in Data Science. Quantitative Finance, 20, 1591-1594. [Google Scholar] [CrossRef
[2] 姜建武, 王博. 高维数据组合关联关系挖掘方法[J]. 科学技术与工程, 2023, 23(4): 1615-1624.
[3] Kundu, R. and Chattopadhyay, S. (2022) Deep Features Selection through Genetic Algorithm for Cervical Pre-Cancerous Cell Classification. Multimedia Tools and Applications, 82, 13431-13452. [Google Scholar] [CrossRef
[4] Dutta, S. and Das, M. (2023) Remote Sensing Scene Classification under Scarcity of Labelled Samples—A Survey of the State-of-the-Arts. Computers & Geosciences, 171, Article 105295. [Google Scholar] [CrossRef
[5] Lee, J. and Kim, D. (2013) Feature Selection for Multi-Label Classification Using Multivariate Mutual Information. Pattern Recognition Letters, 34, 349-357. [Google Scholar] [CrossRef
[6] Lee, J. and Kim, D. (2015) Mutual Information-Based Multi-Label Feature Selection Using Interaction Information. Expert Systems with Applications, 42, 2013-2025. [Google Scholar] [CrossRef
[7] Lee, J. and Kim, D. (2017) SCLS: Multi-Label Feature Selection Based on Scalable Criterion for Large Label Set. Pattern Recognition, 66, 342-352. [Google Scholar] [CrossRef
[8] Jian, L., Li, J., Shu, K., et al. (2016) Multi-Label Informed Feature Selection. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, 9-15 July 2016, 1627-1633.
[9] Lee, J. and Kim, D. (2015) Fast Multi-Label Feature Selection Based on Information-Theoretic Feature Ranking. Pattern Recognition, 48, 2761-2771. [Google Scholar] [CrossRef
[10] Zhang, P., Liu, G. and Gao, W. (2019) Distinguishing Two Types of Labels for Multi-Label Feature Selection. Pattern Recognition, 95, 72-82. [Google Scholar] [CrossRef
[11] Liu, J., Li, Y., Weng, W., et al. (2020) Feature Selection for Multi-Label Learning with Streaming Label. Neurocomputing, 387, 268-278.
[12] Shannon, C.E. (1948) A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379-423. [Google Scholar] [CrossRef
[13] Pan, M., Sun, Z., Wang, C. and Cao, G. (2022) A Multi-Label Feature Selection Method Based on an Approximation of Interaction Information. Intelligent Data Analysis, 26, 823-840. [Google Scholar] [CrossRef
[14] Grigorios, T., Eleftherios, S.-X. and Jozef, V. (2011) Mulan: A Java Library for Multi-Label Learning. Journal of Machine Learning Research, 12, 2411-2414.
[15] Zhang, M. and Zhou, Z. (2007) ML-KNN: A Lazy Learning Approach to Multi-Label Learning. Pattern Recognition, 40, 2038-2048. [Google Scholar] [CrossRef