基于梯度下降的循环批量权重平均优化算法

doi:10.12677/aam.2024.136256

期刊菜单

基于梯度下降的循环批量权重平均优化算法
Recurrent Batch Weight Averaging Optimization Algorithm Based on Gradient Descent

DOI: 10.12677/aam.2024.136256, PDF,
作者: 赵子锐, 李晨龙^*, 杨卫华：太原理工大学数学学院，山西太原
关键词: 深度神经网络；梯度下降；循环批量权重平均；Deep Neural Networks； Gradient Descent； Recurrent Batch Weight Averaging

摘要: 神经网络模型的权重对模型的性能具有关键影响，权重的更新方式主要通过梯度下降算法实现。随机权重平均算法改进了权重更新方法，它通过平均随机梯度下降过程中得到的多个权重样本提高模型的泛化能力。然后，该算法并没有对平均后的模型进一步训练，也没有参与并影响模型的训练过程。本文针对该算法的局限性，结合已有的循环随机权重平均算法，提出循环批量权重平均算法。与循环随机权重平均算法在训练周期之间平均不同，本文所提算法在每个训练周期内的各个批次之间进行权重平均，并将平均后的权重作为下一个批次训练的初始权重，循环融合权重的平均过程和更新过程。本文将提出的算法与梯度下降和随机权重平均算法在CIFAR-10和CIFAR-100数据集上分别对VGG-16和ResNet-18模型进行多次仿真实验，并对实验结果进行比较。实验结果表明，循环批量权重平均算法显著加快模型的收敛速度，提升模型的训练效率，并能提高模型在测试集上的准确率。

Abstract: The weights of the neural network model have a key impact on the performance of the model, and the update method of the weights is mainly realized by the gradient descent algorithm. The stochastic weight averaging algorithm improves the weight updating method, which improves the generalization ability of the model by averaging multiple weight samples obtained in the process of stochastic gradient descent. Then, the algorithm does not further train the averaged model, nor does it affect the training process of the model. Aiming at the limitations of this algorithm, this paper proposes a recurrent batch weight averaging algorithm based on the existing recurrent random weight averaging algorithm. Different from the recurrent random weight averaging algorithm which averages between training epoch, the proposed algorithm averages the weights between batch in each training epoch, uses the averaged weights as the initial weights of the next batch of training, and circulates the average process and updates process of the weights. In this paper, the proposed algorithm is compared with gradient descent and stochastic weight averaging algorithm on the CIFAR-10 and CIFAR-100 datasets for multiple simulation experiments on the VGG-16 and ResNet-18 models, respectively, and the experimental results are compared. The experimental results show that the cyclic batch weight averaging algorithm can significantly accelerate the convergence speed of the model, improve the training efficiency of the model, and improve the accuracy of the model on the test set.

文章引用：赵子锐, 李晨龙, 杨卫华. 基于梯度下降的循环批量权重平均优化算法[J]. 应用数学进展, 2024, 13(6): 2675-2686. https://doi.org/10.12677/aam.2024.136256

参考文献

[1]	谢正, 李浩, 宋伊萍, 等. 从AIGC到AIGA, 智能新赛道: 决策大模型[J/OL]. 科学观察, 2024: 1-24. http://kns.cnki.net/kcms/detail/11.5469.N.20240412.1839.002.html, 2024-04-17.
[2]	Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef]
[3]	Roux, N., Schmidt, M. and Bach, F. (2012) A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets. Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe Nevada, 3-6 December 2012, 2663-2671.
[4]	Johnson, R. and Zhang, T. (2013) Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. Advances in Neural Information Processing Systems, 26, 315-323.
[5]	Shalev-Shwartz, S. and Zhang, T. (2013) Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. Journal of Machine Learning Research, 14, 567-599.
[6]	Nguyen, L.M., Liu, J., Scheinberg, K., et al. (2017) SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 2613-2621.
[7]	Konečný, J., Liu, J., Richtárik, P., et al. (2015) Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting. IEEE Journal of Selected Topics in Signal Processing, 10, 242-255. [Google Scholar] [CrossRef]
[8]	Beznosikov, A., Gorbunov, E., Berard, H. and Loizou, N. (2023) Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods. arXiv: 2202.07262.
[9]	王贝伦. 机器学习[M]. 南京: 东南大学出版社, 2021.
[10]	Mignacco, F. and Urbani, P. (2022) The Effective Noise of Stochastic Gradient Descent. Journal of Statistical Mechanics: Theory and Experiment, No. 8, Article ID: 083405. [Google Scholar] [CrossRef]
[11]	Smith, S., Elsen, E. and De, S. (2020) On the Generalization Benefit of Noise in Stochastic Gradient Descent. arXiv: 2006.15081.
[12]	Wojtowytsch, S. (2023) Stochastic Gradient Descent with Noise of Machine Learning Type Part I: Discrete Time Analysis. Journal of Nonlinear Science, 33, Article No. 45. [Google Scholar] [CrossRef]
[13]	Izmailov, P., Podoprikhin, D., Garipov, T., et al. (2018) Averaging Weights Leads to Wider Optima and Better Generalization. arXiv: 1803.05407.
[14]	Jain, P., Kakade, S.M., Kidambi, R., et al. (2018) Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-Batching, Averaging, and Model Misspecification. Journal of Machine Learning Research, 18, 1-42.
[15]	Crowder, S.V. and Hamilton, M.D. (1992) An EWMA for Monitoring a Process Standard Deviation. Journal of Quality Technology, 24, 12-21. [Google Scholar] [CrossRef]

为你推荐

友情链接