基于采样增强与动态直方图的改进LightGBM算法
An Improved LightGBM Algorithm Based on Sampling Enhancement and Dynamic Histogram
DOI: 10.12677/csa.2025.155140, PDF,   
作者: 张 林, 严 涛:南京理工大学数学与统计学院,江苏 南京
关键词: LightGBM算法采样方法直方图算法LightGBM Algorithm Sampling Method Histogram Algorithm
摘要: 梯度提升类算法面临的主要问题是大规模数据下的运算速度问题。本文针对LightGBM中采样仅依赖一阶导数影响精度,以及直方图分箱忽视数据分布特征导致计算冗余,提出了基于牛顿法的梯度单边采样,引入二阶导数提高采样精度,同时设计动态直方图算法,实现分布和标签感知的自适应分箱。在Epsilon和MNIST8M数据集上的实验表明,新方法在提升模型性能的同时,训练时间分别减少了20.7%和9.8%。
Abstract: Gradient boosting algorithms face computational efficiency challenges when processing large-scale data. In order to improve the limitations in LightGBM: the gradient-based one-side sampling relying solely on first-order derivatives which compromises accuracy, and histogram binning ignoring data distribution characteristics leading to computational redundancy, we propose a Newton-based gradient one-side sampling method incorporating second-order derivatives to enhance precision, along with a dynamic histogram algorithm enabling distribution-aware and label-aware adaptive binning. Experimental results on the Epsilon and MNIST8M datasets demonstrate that our approach improves model performance while reducing training time by 20.7% and 9.8% respectively.
文章引用:张林, 严涛. 基于采样增强与动态直方图的改进LightGBM算法[J]. 计算机科学与应用, 2025, 15(5): 680-689. https://doi.org/10.12677/csa.2025.155140

参考文献

[1] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W. and Liu, T. Y. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, 30, 3147-3155.
[2] Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29, 1189-1232. [Google Scholar] [CrossRef
[3] Ponsam, J.G., Bella Gracia, S.V.J., Geetha, G., Karpaselvi, S. and Nimala, K. (2021) Credit Risk Analysis Using LightGBM and a Comparative Study of Popular Algorithms. 2021 4th International Conference on Computing and Communications Technologies (ICCCT), Chennai, 16-17 December 2021, 634-641. [Google Scholar] [CrossRef
[4] Ge, D., Gu, J., Chang, S. and Cai, J. (2020) Credit Card Fraud Detection Using LightGBM Model. 2020 International Conference on E-Commerce and Internet Technology (ECIT), Zhangjiajie, 22-24 April 2020, 232-236. [Google Scholar] [CrossRef
[5] Han, L., Yang, T., Pu, X., Sun, L., Yu, B. and Xi, J. (2021) Alzheimer’s Disease Classification Using LightGBM and Euclidean Distance Map. 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, 12-14 March 2021, 1540-1544. [Google Scholar] [CrossRef
[6] Alzamzami, F., Hoda, M. and El Saddik, A. (2020) Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation. IEEE Access, 8, 101840-101858. [Google Scholar] [CrossRef
[7] Ong, Y.J., Zhou, Y., Baracaldo, N. and Ludwig, H. (2020) Adaptive Histogram-Based Gradient Boosted Trees for Federated Learning.
[8] Zhang, H., Si, S. and Hsieh, C.J. (2017) GPU-Acceleration for Large-Scale Tree Boosting.
[9] Meng, Q., Ke, G., Wang, T., Chen, W., Ye, Q., Ma, Z.M. and Liu, T.Y. (2016) A Communication-Efficient Parallel Algorithm for Decision Tree. Advances in Neural Information Processing Systems, 29, 1279-1287.
[10] Shi, Y., Ke, G., Chen, Z., Zheng, S. and Liu, T. Y. (2022) Quantized Training of Gradient Boosting Decision Trees. Advances in Neural Information Processing Systems, 35, 18822-18833.
[11] Chen, T. and Guestrin, C. (2016) XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2017, 785-794. [Google Scholar] [CrossRef