基于不定系数常微分方程的加速梯度算法研究
Research on Accelerated Gradient Algorithm Based on Ordinary Differential Equations with Uncertain Coefficients
摘要: 随着人工智能功能的日益强大,对机器学习、深度学习的研究也日趋完善。深度学习的核心思想是求解优化问题,而随着当前实际问题规模的日益扩大,梯度类算法由于运算成本相对较低,日益受到人们的关注和青睐,设计新的加速梯度算法是加快问题求解的关键。本文基于常微分方程,得到了一系列新的加速梯度算法。本文考虑了一类系数不定的二阶常微分方程,证明了系数在满足什么条件下,该微分方程能够快速收敛。对于该微分方程,本文采用两种不同的离散格式:显式欧拉格式、隐式欧拉格式进行离散,分别得到了两类不同的优化算法,分别证明了算法的收敛速率。最后,本文通过数值实验证明了算法的有效性。
Abstract: With the increasingly powerful functions of artificial intelligence, research on machine learning and deep learning is also becoming increasingly sophisticated. The core idea of deep learning is to solve optimization problems, and with the increasing scale of practical problems, gradient algorithms are increasingly receiving attention and favor due to their relatively low computational costs. Designing new accelerated gradient algorithms is the key to accelerating problem solving. This article presents a series of new accelerated gradient algorithms based on ordinary differential equations. This article considers a class of second-order ordinary differential equations with uncertain coefficients and proves under what conditions the coefficients satisfy that the differential equation can converge quickly. For this differential equation, this paper adopts two different discretization schemes: explicit Euler scheme and implicit Euler scheme for discretization, and obtains two different optimization algorithms, respectively proving the convergence rate of the algorithms. Finally, this article demonstrates the effectiveness of the algorithm through numerical experiments.
文章引用:谢旭东. 基于不定系数常微分方程的加速梯度算法研究[J]. 运筹与模糊学, 2024, 14(3): 91-101. https://doi.org/10.12677/orf.2024.143248

参考文献

[1] Boyd, S.P. and Vandenberghe, L. (2004) Convex Optimization. Cambridge University Press. [Google Scholar] [CrossRef
[2] Polyak, B.T. (1964) Some Methods of Speeding Up the Convergence of Iteration Methods. USSR Computational Mathematics and Mathematical Physics, 4, 1-17. [Google Scholar] [CrossRef
[3] Nesterov, Y. (1983) A Method of Solving a Convex Programming Problem with Convergence Rate O (1/k*2). Doklady Akademii Nauk SSSR, 269, 543.
[4] Alvarez, F., Attouch, H., Bolte, J., et al. (2002) A Second-Order Gradient-Like Dissipative Dynamical System with Hessian-Driven Damping: Application to Optimization and Mechanics. Journal de Mathématiques Pures et Appliquées, 81, 747-779. [Google Scholar] [CrossRef
[5] Su, W., Boyd, S. and Candes, E.J. (2016) A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights. Journal of Machine Learning Research, 17, 1-43.
[6] Shi, B., Du, S.S., Jordan, M.I., et al. (2022) Understanding the Acceleration Phenomenon via High-Resolution Differential Equations. Mathematical Programming, 195, 79-148. [Google Scholar] [CrossRef
[7] Chen, L. and Luo, H. (2018) A Unified Convergence Analysis of First Order Convex Optimization Methods via Strong Lyapunov Functions. arXiv preprint arXiv:2108.00132
[8] Chen, L. and Luo, H. (2019) First Order Optimization Methods Based on Hessian-Driven Nesterov Accelerated Gradient Flow. arXiv preprint arXiv:1912.09276