不均衡数据集条件下基于熵的自适应BNs参数学习
Parameter Learning of Adaptive BNs Based on Entropy under Imbalanced Dataset Conditions
DOI: 10.12677/sa.2025.1411310, PDF,   
作者: 刘 蓉, 刘 赪:西南交通大学数学学院/统计系,四川 成都
关键词: 不均衡数据集贝叶斯网络参数学习Imbalanced Dataset Entropy Bayesian Network Parameter Learning
摘要: 针对不均衡数据集条件下构建贝叶斯网络易出现零概率值问题,提出一种无需专家依赖的基于熵的自适应贝叶斯网络参数学习方法。首先,本文用熵量化数据的不均衡性,并将这种不均衡性以正态分布的形式作为先验信息,利用标准条件熵以及3σ准则构造自适应方差,最大似然估计得到均值,并通过网格搜索以及交叉验证寻找最优容许误差;然后,用正态分布作为Dirichlet分布的近似,结合最大后验概率估计计算网络参数值;最后,在不同样本量、不同网络的数据集下进行实验测试,并将本文方法与其他3种主要方法进行比较。结果表明:在不均衡数据集条件下,本文方法无需专家依赖且参数学习精度都优于其他3种方法。
Abstract: To address the issue of zero probability values that often arises when constructing Bayesian networks from imbalanced datasets, this paper proposes an expert-independent adaptive parameter learning method for Bayesian networks based on entropy. First, entropy is used to quantify the degree of imbalance in the data. This imbalance is then incorporated as prior information in the form of a normal distribution. An adaptive variance is constructed using standard conditional entropy and the three-sigma rule, while the mean is derived via maximum likelihood estimation. The optimal permissible error is identified through grid search and cross-validation. Subsequently, the normal distribution is used as an approximation of the Dirichlet distribution, and network parameters are calculated by integrating maximum a posteriori estimation. Finally, experiments are conducted on datasets with varying sample sizes and network structures, and the proposed method is compared with three other major approaches. The results demonstrate that, under imbalanced dataset conditions, the proposed method achieves higher parameter learning accuracy without relying on expert knowledge compared to the other three methods.
文章引用:刘蓉, 刘赪. 不均衡数据集条件下基于熵的自适应BNs参数学习[J]. 统计学与应用, 2025, 14(11): 54-66. https://doi.org/10.12677/sa.2025.1411310

参考文献

[1] Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.
[2] Li, J., Du, P., Ye, A.Y., Zhang, Y., Song, C., Zeng, H., et al. (2019) GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimation. Genomics, Proteomics & Bioinformatics, 17, 106-117. [Google Scholar] [CrossRef] [PubMed]
[3] de Campos, C.P. and Ji, Q. (2008) Improving Bayesian Network Parameter Learning Using Constraints. 2008 19th International Conference on Pattern Recognition, Tampa, 8-11 December 2008, 1-4. [Google Scholar] [CrossRef
[4] Liao, W. and Ji, Q. (2009) Learning Bayesian Network Parameters under Incomplete Data with Domain Knowledge. Pattern Recognition, 42, 3046-3056. [Google Scholar] [CrossRef
[5] Ru, X., Gao, X., Wang, Z., Wang, Y. and Liu, X. (2023) Bayesian Network Parameter Learning Using Fuzzy Constraints. Neurocomputing, 544, Article ID: 126239. [Google Scholar] [CrossRef
[6] Hou, Y., Zheng, E., Guo, W., Xiao, Q. and Xu, Z. (2020) Learning Bayesian Network Parameters with Small Data Set: A Parameter Extension under Constraints Method. IEEE Access, 8, 24979-24989. [Google Scholar] [CrossRef
[7] Jiang, Y., Liang, Z., Gao, H., Guo, Y., Zhong, Z., Yang, C., et al. (2018) An Improved Constraint-Based Bayesian Network Learning Method Using Gaussian Kernel Probability Density Estimator. Expert Systems with Applications, 113, 544-554. [Google Scholar] [CrossRef
[8] 邸若海, 李叶, 万开方, 等. 基于改进QMAP的贝叶斯网络参数学习算法[J]. 西北工业大学学报, 2021, 39(6): 1356-1367.
[9] 邸若海, 高晓光, 郭志高. 基于单调性约束的离散贝叶斯网络参数学习[J]. 系统工程与电子技术, 2014, 36(2): 272-277.
[10] 柴慧敏, 赵昀瑶, 方敏. 利用先验正态分布的贝叶斯网络参数学习[J]. 系统工程与电子技术, 2018, 40(10): 2370-2375.
[11] 曾强, 黄政, 魏曙寰. 融合专家先验知识和单调性约束的贝叶斯网络参数学习方法[J]. 系统工程与电子技术, 2020, 42(3): 646-652.
[12] Zhou, Z., Lam, E.Y. and Lee, C. (2019) Nonlocal Means Filtering Based Speckle Removal Utilizing the Maximum a Posteriori Estimation and the Total Variation Image Prior. IEEE Access, 7, 99231-99243. [Google Scholar] [CrossRef
[13] Shannon, C.E. (1948) A Mathematical Theory of Communication. Bell System Technical Journal, 27, 623-656. [Google Scholar] [CrossRef
[14] 张连文, 郭海鹏. 贝叶斯网引论[M]. 北京: 科学出版社, 2006: 35-154.
[15] 梅家斌. 关于Dirichlet分布参数估计的渐进分布[J]. 武汉科技学院学报, 2002, 15(1): 8-12.
[16] Kullback, S. and Leibler, R.A. (1951) On Information and Sufficiency. The Annals of Mathematical Statistics, 22, 79-86. [Google Scholar] [CrossRef