基于模糊组合熵的不完备多标签特征选择
Incomplete Multi-Label Feature Selection Based on Fuzzy Combination Entropy
摘要: 多标签数据通常具有高维特征空间与复杂的标签结构,这种高维性和复杂性易造成数据不同程度的不完备,从而影响多标签学习的性能。由此,本文提出基于模糊组合熵的不完备多标签特征选择方法。首先,在不完备多标签模糊信息系统中,通过引入特征值缺失率与调节参数定义模糊关系,进而定义模糊信息粒、模糊标签粒以及多标签模糊下上近似,建立不完备多标签模糊粗糙集。接着,在不完备多标签模糊粗糙集上引入组合熵的信息论思想,在此基础上定义模糊组合熵、模糊联合组合熵、模糊条件组合熵等信息度量,研究它们的性质和关系。最后,基于模糊组合熵分析特征的内外重要度,给出适用于不完备多标签数据的特征选择算法。实验结果表明,本文所提算法在5个多标签数据集上相较于对比方法取得了更优的分类性能:平均精度(AP)平均提升3.48%,汉明损失(HL)、排序损失(RL)、覆盖率(CV)、1-错误率(OE)分别平均降低3.02%、4.33%、2.83%和 4.64%。实验结果验证了本文所提算法的有效性。
Abstract: Multi-label data usually has high-dimensional feature Spaces and complex label structures. This high dimensionality and complexity can easily cause varying degrees of incompleteness in the data, thereby affecting the performance of multi-label learning. To address this issue, this paper proposes an incomplete multi-label feature selection method based on fuzzy combination entropy. Firstly, in the incomplete multi-label fuzzy information system, the fuzzy relationship is constructed by incorporating the feature-value missing rate together with a regulating parameter. Based on the defined fuzzy relationship, fuzzy information granule, fuzzy label granule, and multi-label fuzzy lower and upper approximation are defined to establish the incomplete multi-label fuzzy rough set. Then, the information-theoretic concept of combination entropy is introduced on the incomplete multi-label fuzzy rough set. On this basis, information metrics such as fuzzy combination entropy, fuzzy joint combination entropy, and fuzzy conditional combination entropy are defined, and their properties and relationships are studied. Finally, the intra- and extra-feature significances are analyzed based on fuzzy combination entropy, and a feature selection algorithm suitable for incomplete multi-label data is presented. The experimental results show that the algorithm proposed in this paper achieves better classification performance on five multi-label datasets compared with the comparison methods: The Average Precision (AP) is increased by an average of 3.48%, and the Hamming Loss (HL), Ranking Loss (RL), Coverage (CV), and One-Error (OE) are reduced by an average of 3.02%, 4.33%, 2.83% and 4.64% respectively. The experimental results verify the effectiveness of the algorithm proposed in this paper.
文章引用:杨心怡. 基于模糊组合熵的不完备多标签特征选择[J]. 应用数学进展, 2026, 15(1): 278-292. https://doi.org/10.12677/aam.2026.151028

参考文献

[1] Zhang, P., Liu, G. and Song, J. (2023) MFSJMI: Multi-Label Feature Selection Considering Join Mutual Information and Interaction Weight. Pattern Recognition, 138, Article ID: 109378. [Google Scholar] [CrossRef
[2] Li, Y., Hu, L. and Gao, W. (2024) Multi-Label Feature Selection with High-Sparse Personalized and Low-Redundancy Shared Common Features. Information Processing & Management, 61, Article ID: 103633. [Google Scholar] [CrossRef
[3] Sheikhpour, R., Mohammadi, M., Berahmand, K., Saberi-Movahed, F. and Khosravi, H. (2025) Robust Semi-Supervised Multi-Label Feature Selection Based on Shared Subspace and Manifold Learning. Information Sciences, 699, Article ID: 121800. [Google Scholar] [CrossRef
[4] Dai, J. and Wang, J. (2025) Multi-Label Feature Selection with Missing Features by Tolerance Implication Granularity Information and Symmetric Coupled Discriminant Weight. Pattern Recognition, 162, Article ID: 111365. [Google Scholar] [CrossRef
[5] Han, Y., Sun, G., Shen, Y. and Zhang, X. (2018) Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, 19-23 August 2018, 1494-1503. [Google Scholar] [CrossRef
[6] Dai, J., Chen, W., Qian, Y. and Pedrycz, W. (2025) Instance-Dependent Incomplete Multi-Label Feature Selection by Fuzzy Tolerance Relation and Fuzzy Mutual Implication Granularity. IEEE Transactions on Knowledge and Data Engineering, 37, 5994-6008. [Google Scholar] [CrossRef
[7] Li, J., Li, P., Zou, Y. and Hu, X. (2021) Multi-Label Learning with Missing Features. 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, 18-22 July 2021, 1-8. [Google Scholar] [CrossRef
[8] Pawlak, Z. (1982) Rough Sets. International Journal of Computer & Information Sciences, 11, 341-356. [Google Scholar] [CrossRef
[9] Lin, Y., Li, Y., Wang, C. and Chen, J. (2018) Attribute Reduction for Multi-Label Learning with Fuzzy Rough Set. Knowledge-Based Systems, 152, 51-61. [Google Scholar] [CrossRef
[10] Chen, P., Lin, M. and Liu, J. (2020) Multi-Label Attribute Reduction Based on Variable Precision Fuzzy Neighborhood Rough Set. IEEE Access, 8, 133565-133576. [Google Scholar] [CrossRef
[11] Sun, L., Du, W., Ding, W., Long, Q. and Xu, J. (2025) Granular Ball-Based Fuzzy Multineighborhood Rough Set for Feature Selection via Label Enhancement. Engineering Applications of Artificial Intelligence, 145, Article ID: 110191. [Google Scholar] [CrossRef
[12] 苗夺谦. Rough Set理论及其在机器学习中的应用研究[Z]. 北京: 中国科学院自动化研究所, 1997.
[13] Qian, Y. and Liang, J. (2006) Combination Entropy and Combination Granulation in Incomplete Information System. In: Proceedings of the Rough Sets and Knowledge Technology, Springer, 184-190. [Google Scholar] [CrossRef
[14] Zhang, P., Li, T., Yuan, Z., Luo, C., Liu, K. and Yang, X. (2024) Heterogeneous Feature Selection Based on Neighborhood Combination Entropy. IEEE Transactions on Neural Networks and Learning Systems, 35, 3514-3527. [Google Scholar] [CrossRef] [PubMed]
[15] Yang, T., Wang, C., Chen, Y. and Deng, T. (2025) A Robust Multi-Label Feature Selection Based on Label Significance and Fuzzy Entropy. International Journal of Approximate Reasoning, 176, Article ID: 109310. [Google Scholar] [CrossRef
[16] Liao, C. and Yang, B. (2025) A Novel Multi-Label Feature Selection Method Based on Conditional Entropy and Its Acceleration Mechanism. International Journal of Approximate Reasoning, 185, Article ID: 109469. [Google Scholar] [CrossRef
[17] 陈曦, 马建敏, 刘权芳. 基于模糊依赖决策熵的多标签特征选择[J]. 昆明理工大学学报(自然科学版), 2024, 49(2): 62-72.
[18] Dai, J. (2013) Rough Set Approach to Incomplete Numerical Data. Information Sciences, 241, 43-57. [Google Scholar] [CrossRef
[19] Zhang M.L., Zhou Z.H. (2014) A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering, 26, 1819-1837. [Google Scholar] [CrossRef
[20] Zhang M.-L. and Zhou Z.-H. (2007) ML-KNN: A Lazy Learning Approach to Multi-Label Learning. Pattern Recognition, 40, 2038-2048. [Google Scholar] [CrossRef
[21] 陈曦. 三种多标签数据表上的特征选择方法[D]: [硕士学位论文]. 西安: 长安大学, 2024.