求解非凸复合优化问题的随机Douglas-Rachford分裂算法
Stochastic Douglas-Rachford Splitting Algorithm for Solving Non-Convex Composite Optimization Problems
摘要: 本文研究了大规模非凸复合优化问题,该问题的目标函数为非凸的光滑函数与凸(可能是非光滑的)函数之和,其中非凸的光滑函数具有有限和形式。随着大规模非凸复合优化问题在机器学习、图像处理中广泛应用,设计简洁、高效、计算成本低的优化算法来解决大规模非凸复合优化问题已成为当今研究热点之一。求解非凸复合优化问题的一个经典算法是Douglas-Rachford (DR)分裂算法,然而当实际问题规模较大时经典的DR分裂算法计算成本较大。据此本文提出了一种方差缩减的随机DR分裂算法(SDR),将方差缩减的随机梯度引入到DR分裂算法中,以减小算法的计算成本,并基于Kurdyka-Łojasiewicz (KL)框架对算法的收敛性进行了分析。具体的,我们首先建立了Lyapunov函数的下降性,然后建立了DR价值函数的相对误差界条件,最后利用KL性质证明了算法的全局收敛性。此外,通过将SDR算法用于求解 1 2 稀疏正则化Logistic回归问题,并与PG、Prox-SARAH进行对比,展示了本文所提出算法的优越性。
Abstract: This paper investigates large-scale non-convex composite optimization problems, where the objective function is the sum of a nonconvex smooth function and a convex (possibly nonsmooth) function, with the nonconvex smooth function having a finite sum form. Given the widespread application of large-scale nonconvex composite optimization problems in machine learning and image processing, developing concise, efficient, and low-computational-cost optimization algorithms for solving such problems has become one of the current research hotspots. A classic algorithm for solving non-convex composite optimization problems is the Douglas-Rachford (DR) splitting algorithm. However, the classical DR splitting algorithm incurs significant computational costs when dealing with large-scale practical problems. Accordingly, this paper proposes a variance-reduced stochastic DR splitting algorithm (SDR), incorporating variance reduction stochastic gradients into the DR splitting algorithm to reduce computational costs, and analyzes the convergence of the algorithm based on the Kurdyka-Łojasiewicz (KL) framework. Moreover, we use the SDR algorithm to solve the 1 2 sparse logistic regression problem and compare its performance with PG and Prox-SARAH to show the efficiency of SDR.
文章引用:李登辉, 孟士博, 陈梦婷. 求解非凸复合优化问题的随机Douglas-Rachford分裂算法[J]. 运筹与模糊学, 2024, 14(6): 116-133. https://doi.org/10.12677/orf.2024.146516

参考文献

[1] Lions, P.L. and Mercier, B. (1979) Splitting Algorithms for the Sum of Two Nonlinear Operators. SIAM Journal on Numerical Analysis, 16, 964-979. [Google Scholar] [CrossRef
[2] Beck, A. and Teboulle, M. (2009) Gradient-Based Algorithms with Applications to Signal-Recovery Problems. In: Convex Optimization in Signal Processing and Communications, Cambridge University Press, 42-88. [Google Scholar] [CrossRef
[3] Douglas, J. and Rachford, H.H. (1956) On the Numerical Solution of Heat Conduction Problems in Two and Three Space Variables. Transactions of the American Mathematical Society, 82, 421-439. [Google Scholar] [CrossRef
[4] Aragón Artacho, F.J., Borwein, J.M. and Tam, M.K. (2013) Recent Results on Douglas-Rachford Methods for Combinatorial Optimization Problems. Journal of Optimization Theory and Applications, 163, 1-30. [Google Scholar] [CrossRef
[5] Boţ, R.I., Csetnek, E.R. and Hendrich, C. (2015) Inertial Douglas-Rachford Splitting for Monotone Inclusion Problems. Applied Mathematics and Computation, 256, 472-487. [Google Scholar] [CrossRef
[6] Li, G. and Pong, T.K. (2015) Douglas-Rachford Splitting for Nonconvex Optimization with Application to Nonconvex Feasibility Problems. Mathematical Programming, 159, 371-401. [Google Scholar] [CrossRef
[7] Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef
[8] Li, M., Zhang, T., Chen, Y. and Smola, A.J. (2014). Efficient Mini-Batch Training for Stochastic Optimization. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 24-27 August 2014, 661-670.[CrossRef
[9] Schmidt, M., Le Roux, N. and Bach, F. (2016) Minimizing Finite Sums with the Stochastic Average Gradient. Mathematical Programming, 162, 83-112. [Google Scholar] [CrossRef
[10] Johnson, R. and Zhang, T. (2013) Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. Advances in Neural Information Processing Systems, 2013, Article 26.
[11] Doucet, A., Godsill, S. and Andrieu, C. (2000) On Sequential Monte Carlo Sampling Methods for Bayesian Filtering. Statistics and Computing, 10, 197-208. [Google Scholar] [CrossRef
[12] Nesterov, Y. (2012) Gradient Methods for Minimizing Composite Functions. Mathematical Programming, 140, 125-161. [Google Scholar] [CrossRef
[13] Nguyen, L.M., van Dijk, M., Phan, D.T., Nguyen, P.H., Weng, T. and Kalagnanam, J.R. (2022) Finite-Sum Smooth Optimization with Sarah. Computational Optimization and Applications, 82, 561-593. [Google Scholar] [CrossRef
[14] Fang, C., Li, C.J., Lin, Z. and Zhang, T. (2018) Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator.
[15] Ghadimi, S. and Lan, G. (2013) Stochastic First and Zeroth-Order Methods for Nonconvex Stochastic Programming. SIAM Journal on Optimization, 23, 2341-2368. [Google Scholar] [CrossRef
[16] Ghadimi, S. and Lan, G. (2015) Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming. Mathematical Programming, 156, 59-99. [Google Scholar] [CrossRef
[17] Pham, N.H., Nguyen, L.M., Phan, D.T., et al. (2020) Prox SARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization. Journal of Machine Learning Research, 21, 1-48.
[18] Rockafellar, R.T. and Wets, R.J.B. (2009) Variational Analysis. Springer Science & Business Media.
[19] Nesterov, Y. (2018) Lectures on Convex Optimization. Springer.
[20] Bolte, J., Sabach, S. and Teboulle, M. (2013) Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems. Mathematical Programming, 146, 459-494. [Google Scholar] [CrossRef
[21] Driggs, D., Tang, J., Liang, J., Davies, M. and Schönlieb, C. (2021) A Stochastic Proximal Alternating Minimization for Nonsmooth and Nonconvex Optimization. SIAM Journal on Imaging Sciences, 14, 1932-1970. [Google Scholar] [CrossRef