关于相关数据流的随机梯度哈密尔顿蒙特卡罗算法的非渐近估计
Non-Asymptotic Estimates for Stochastic Gradient Hamiltonian Monte Carlo Algorithm with Dependent Data Streams
摘要: 近年来,基于朗之万蒙特卡罗方法的随机梯度下降算法得到了广泛应用。这些算法通过在梯度的估计中注入适当的高斯噪声以实现在非凸优化问题中的全局收敛。随机梯度哈密尔顿蒙特卡罗(SGHMC)是随机梯度下降带有动量的一种变体,通常的研究以样本数据相互独立的假设为前提来分析SGHMC算法的收敛性,然而实际中的样本数据往往存在相关性。本文在数据流具有相关性(满足一定的条件混合特性)的条件下,给出了SGHMC算法的非渐进估计,建立了全局Lipschtiz条件下SGHMC算法的收敛性定理,得到了迭代分布与目标分布之间Wasserstein距离的上界。
Abstract: In recent years, stochastic gradient descent algorithms based on the Langevin Monte Carlo method have been widely applied, which achieve global convergence in non-convex optimization problems by injecting appropriate Gaussian noise into the gradient estimates. Stochastic gradient Hamiltoni-an Monte Carlo (SGHMC) is a variant of stochastic gradient descent with momentum. Usually, studies analyze the convergence of algorithm based on the assumption that the sample data are i.i.d. How-ever, in practice, sample data may be dependent. This paper provides non-asymptotic estimates for SGHMC algorithm under the condition that the data streams are dependent (satisfying a certain conditional mixing property), establishes a convergence theorem for SGHMC algorithm under the global Lipschitz condition, and obtains an upper bound of the Wasserstein distance between the law of algorithm’s iterates and the target distribution.
文章引用:杨金圆, 胡良剑. 关于相关数据流的随机梯度哈密尔顿蒙特卡罗算法的非渐近估计[J]. 应用数学进展, 2024, 13(2): 569-583. https://doi.org/10.12677/AAM.2024.132055

参考文献

[1] Raginsky, M., Rakhlin, A. and Telgarsky, M. (2017) Non-Convex Learning via Stochastic Gradient Langevin Dynamics: A Nonasymptotic Analysis. Conference on Learning Theory, 65, 1674-1703.
[2] Hwang, C.R. (1980) Laplace’s Method Revisited: Weak Convergence of Probability Measures. The Annals of Probability, 8, 1177-1182. [Google Scholar] [CrossRef
[3] Chiang, T.S., Hwang, C.R. and Sheu, S.J. (1987) Diffusion for Global Optimization in . SIAM Journal on Control and Optimization, 25, 737-753. [Google Scholar] [CrossRef
[4] Welling, M. and Teh, Y.W. (2011) Bayesian Learning via Stochastic Gradient Langevin Dynamics. Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, Washington, 28 June 2011, 681-688.
[5] Chen, T., Fox, E. and Guestrin, C. (2014) Stochastic Gradient Hamiltonian Monte Carlo. International Conference on Machine Learning, Bejing, 22-24 June 2014, 1683-1691.
[6] Chau, H.N. and Rasonyi, M. (2022) Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Learning. Stochastic Processes and Their Applications, 149, 341-368. [Google Scholar] [CrossRef
[7] Eberle, A., Guillin, A. and Zimmer, R. (2019) Couplings and Quantitative Contraction Rates for Langevin Dynamics. The Annals of Probability, 47, 1982-2010. [Google Scholar] [CrossRef
[8] Chau, N.H., Moulines, É., Rásonyi, M., Sabanis, S. and Zhang, Y. (2021) On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case. SIAM Journal on Mathematics of Data Science, 3, 959-986. [Google Scholar] [CrossRef
[9] Majka, M.B., Mijatović, A. and Szpruch, Ł. (2020) Nonasymptotic Bounds for Sampling Algorithms without Log-Concavity. The Annals of Applied Probability, 30, 1534-1581. [Google Scholar] [CrossRef
[10] Cont, R. (2001) Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quantitative Finance, 1, 223-226. [Google Scholar] [CrossRef
[11] Gao, X., Gürbüzbalaban, M. and Zhu, L. (2022) Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-convex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration. Operations Research, 70, 2931-2947. [Google Scholar] [CrossRef
[12] Gerencsér, L. (1989). On a Class of Mixing Processes. Stochastics: An International Journal of Probability and Stochastic Processes, 26, 165-191.[CrossRef
[13] Chau, H.N., Kumar, C., Rásonyi, M. and Sabanis, S. (2019) On Fixed Gain Recursive Estimators with Discontinuity in the Parameters. ESAIM: Probability and Statistics, 23, 217-244. [Google Scholar] [CrossRef
[14] Barkhagen, M., Chau, N.H., Moulines, É., Rásonyi, M., Sabanis, S. and Zhang, Y. (2021) On Stochastic Gradient Langevin Dynamics with Dependent Data Streams in the Logconcave Case. Bernoulli, 27, 1-33. [Google Scholar] [CrossRef
[15] Akyildiz, Ö.D. and Sabanis, S. (2020) Nonasymptotic Analysis of Sto-chastic Gradient Hamiltonian Monte Carlo under Local Conditions for Nonconvex Optimization.