随机邻近牛顿型交替极小化算法
Stochastic Proximity Newtonian Alternating Miniaturization Algorithm
摘要: 本文研究一类大规模有限和形式的非凸非光滑复合优化问题,其目标函数为适当下半连续凸函数与连续可微函数(有限和形式)之和。邻近交替线性极小化算法在求解这类问题上有显著优势,但考虑到针对大规模优化问题,该算法需在每一迭代计算全梯度,成本较高,故本文将目标函数光滑部分的随机梯度引入到已有算法,以降低算法的计算成本。此外,考虑到一阶算法在求解病态问题时速度较慢,故本文在算法中引入目标函数的二阶信息,提出了一种随机邻近牛顿型交替极小化算法。在适当的步长边界的基础上,我们建立了算法在期望意义下的全局收敛性。本文提出的随机邻近牛顿型交替极小化算法,不仅提升了算法在机器学习、统计学和图像处理等领域实际应用中的效率和实用性,也为非凸非光滑优化领域的理论发展提供了新的理论基础和算法框架。
Abstract: This article investigates a class of large-scale finite and structured nonconvex nonsmooth composite optimization problems, where the objective function is the sum of an appropriately lower semicontinuous convex function and a continuous differentiable function (finite and structured). The alternating linear minimization algorithm has significant advantages in solving such problems. However, considering the large-scale optimization problems, this algorithm requires computing the full gradient at each iteration, which is costly. Therefore, this article introduces the stochastic gradient of the smooth part of the objective function into the existing algorithm to reduce the computational cost. Additionally, considering that first-order algorithms are slow in solving ill-conditioned problems, this article incorporates the second-order information of the objective function into the algorithm and proposes a stochastic proximal Newton-type alternating minimization algorithm. Based on appropriate step size bounds, we establish the global convergence of the algorithm in the expected sense. The stochastic neighborhood Newton-type alternating minimization algorithm proposed in this paper not only improves the efficiency and practicability of the algorithm in practical applications in the fields of machine learning, statistics and image processing, but also provides a new theoretical foundation and algorithmic framework for the theoretical development in the field of nonconvex nonsmooth optimization.
文章引用:黄静仪, 高任杰, 李磊. 随机邻近牛顿型交替极小化算法[J]. 运筹与模糊学, 2024, 14(6): 676-688. https://doi.org/10.12677/orf.2024.146568

参考文献

[1] Kawasumi, R. and Takeda, K. (2023) Automatic Hyperparameter Tuning in Sparse Matrix Factorization. Neural Computation, 35, 1086-1099. [Google Scholar] [CrossRef] [PubMed]
[2] d’Aspremont, A., El Ghaoui, L., Jordan, M.I. and Lanckriet, G.R.G. (2007) A Direct Formulation for Sparse PCA Using Semidefinite Programming. SIAM Review, 49, 434-448. [Google Scholar] [CrossRef
[3] Zou, H., Hastie, T. and Tibshirani, R. (2006) Sparse Principal Component Analysis. Journal of Computational and Graphical Statistics, 15, 265-286. [Google Scholar] [CrossRef
[4] Candès, E.J., Li, X., Ma, Y. and Wright, J. (2011) Robust Principal Component Analysis? Journal of the ACM, 58, 1-37. [Google Scholar] [CrossRef
[5] Aravkin, A. and Davis, D. (2020) Trimmed Statistical Estimation via Variance Reduction. Mathematics of Operations Research, 45, 292-322. [Google Scholar] [CrossRef
[6] Patrizio, C. and Karen, E. (2017) Blind Image Deconvolution: Theory and Applications. CRC Press, 12-21.
[7] Wang, Q. and Han, D. (2024) Stochastic Gauss-Seidel Type Inertial Proximal Alternating Linearized Minimization and Its Application to Proximal Neural Networks. Mathematical Methods of Operations Research, 99, 39-74. [Google Scholar] [CrossRef
[8] Attouch, H., Bolte, J., Redont, P. and Soubeyran, A. (2010) Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality. Mathematics of Operations Research, 35, 438-457. [Google Scholar] [CrossRef
[9] Bolte, J., Sabach, S. and Teboulle, M. (2013) Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems. Mathematical Programming, 146, 459-494. [Google Scholar] [CrossRef
[10] Nguyen, L.M., Scheinberg, K. and Takáč, M. (2020) Inexact SARAH Algorithm for Stochastic Optimization. Optimization Methods and Software, 36, 237-258. [Google Scholar] [CrossRef
[11] Pan, H. and Zheng, L. (2022) N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples. Computer Modeling in Engineering & Sciences, 131, 493-512. [Google Scholar] [CrossRef
[12] Defazio, A., Bach, R.F. and Lacoste-Julien, S. (2014) SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives.
[13] Wang, Z. and Wen, B. (2022) Proximal Stochastic Recursive Momentum Algorithm for Nonsmooth Nonconvex Optimization Problems. Optimization, 73, 481-495. [Google Scholar] [CrossRef
[14] Huang, F., Gu, B., Huo, Z., Chen, S. and Huang, H. (2019) Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 1503-1510. [Google Scholar] [CrossRef
[15] Pham, N.H., et al. (2020) ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization. Journal of Machine Learning Research, 21, 1-48.
[16] Xu, Y. and Yin, W. (2015) Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization. SIAM Journal on Optimization, 25, 1686-1716. [Google Scholar] [CrossRef
[17] Driggs, D., Tang, J., Liang, J., et al. (2020) SPRING: A Fast Stochastic Proximal Alternating Method for Non-Smooth Non-Convex Optimization.
[18] Yang, M., Milzarek, A., Wen, Z. and Zhang, T. (2021) A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization. Mathematical Programming, 194, 257-303. [Google Scholar] [CrossRef
[19] Bollapragada, R., Byrd, R.H. and Nocedal, J. (2018) Exact and Inexact Subsampled Newton Methods for Optimization. IMA Journal of Numerical Analysis, 39, 545-578. [Google Scholar] [CrossRef
[20] Nesterov, Y. (2012) Gradient Methods for Minimizing Composite Functions. Mathematical Programming, 140, 125-161. [Google Scholar] [CrossRef
[21] Durrett, R. (2019) Probability: Theory and Examples. Cambridge University Press. [Google Scholar] [CrossRef