带有近似最优步长的随机递归梯度算法
Stochastic Recursive Gradient Algorithm with Approximately Optimal Stepsize
摘要: 在机器学习中,我们经常考虑一个目标函数是凸函数和的最小化问题。随机递归梯度算法(SARAH)是求解上面问题的一个常用方法。它允许一个简单的递归框架来更新随机梯度估计。基于SARAH方法,本文提出利用近似最优步长(AOS)去自适应地计算SARAH的步长,并将其命名为SARAH-AOS算法。针对提出的算法,我们进行了数值试验,结果表明SARAH-AOS算法对初始步长的选择并不像SARAH那样敏感。我们的算法对SARAH算法有着显著性能的改进。
Abstract: In machine learning, we often consider the problem of minimizing an objective function that is a sum of convex functions. The stochastic recursive gradient algorithm (SARAH) is a common method to solve the above problems. It allows a simple recursive framework to update stochastic gradient estimates. Based on the SARAH method, this paper, we propose to use the approximately optimal stepsize (AOS) to automatically compute stepsizes for SARAH, which leads to SARAH-AOS algorithm. For the proposed algorithm, we conduct numerical experiments, and the results show that the SARAH-AOS algorithm is not as sensitive to the selection of initial step size as SARAH. Our algorithm has a significant performance improvement over the SARAH algorithm.
文章引用:陈炫睿. 带有近似最优步长的随机递归梯度算法[J]. 运筹与模糊学, 2023, 13(5): 4318-4326. https://doi.org/10.12677/ORF.2023.135430

参考文献

[1] Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef
[2] Bottou, L., Curtis, F.E. and Nocedal, J. (2018) Op-timization Methods for Large-Scale Machine Learning. SIAM Review, 60, 223-311. [Google Scholar] [CrossRef
[3] Roux, N.L., Schmidt, M. and Bach, F.R. (2013) A Stochastic Gra-dient Method with an Exponential Convergence Rate for Finite Training Sets. Neural Information Processing Systems, Lake Tahoe, December 2013, 2663-2671.
[4] Schmidt, M.W., Roux, N.L. and Bach, F. (2017) Minimizing Finite Sums with the Stochastic Average Gradient. Mathematical Programming, 162, 83-112. [Google Scholar] [CrossRef
[5] Defazio, A., Bach, F. and Lacoste-Julien, S. (2014) SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives. Neural Infor-mation Processing Systems, Montreal, December 2014, 1646-1654.
[6] Johnson, R. and Zhang, T. (2013) Ac-celerating Stochastic Gradient Descent Using Predictive Variance Reduction. Neural Information Processing Systems, Lake Tahoe, December 2013, 315-323.
[7] Duchi, J.C., Hazan, E. and Singer, Y.J. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159.
[8] Kingma, D.P. and Ba, J. (2015) Adam: A Method for Stochastic Optimization. International Conference on Learning Representations, San Diego, May 2015, 1-13.
[9] Tan, C., Ma, S., Dai, Y.H. and Qian, Y. (2016) Barzilai-Borwein Step Size for Stochastic Gradient Descent. Neural Information Processing Systems, Bar-celona, December 2016, 685-693.
[10] Nguyen, L.M., Liu, J., Scheinberg, K. and Takac, M. (2017) SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient. Neural Information Processing Systems, Long Beach, December 2017, 2613-2621.
[11] Liu, Y., Wang, X. and Guo, T. (2020) A Linearly Con-vergent Stochastic Recursive Gradient Method for Convex Optimization. Optimization Letters, 14, 2265-2283. [Google Scholar] [CrossRef
[12] Liu, Z., Liu, H. and Dong, X. (2018) An Efficient Gradient Method with Approximate Optimal Stepsize for the Strictly Convex Quadratic Minimization Problem. Optimization, 67, 427-440. [Google Scholar] [CrossRef
[13] Liu, Z. and Liu, H. (2018) Several Effi-cient Gradient Methods with Approximate Optimal Stepsizes for Large Scale Unconstrained Optimization. Journal of Computational and Applied Mathematics, 328, 400-413. [Google Scholar] [CrossRef
[14] Liu, Z. and Hongwei, L. (2018) An Efficient Gradient Method with Approximate Optimal Stepsize for Large-Scale Unconstrained Optimization. Numerical Algorithms, 78, 21-39. [Google Scholar] [CrossRef