|
[1]
|
Jain, P. and Kar, P. (2017) Non-Convex Optimization for Machine Learning. Foundations and Trends® in Machine Learning, 10, 142-336. [Google Scholar] [CrossRef]
|
|
[2]
|
Du, S., Lee, J., Li, H., et al. (2019) Gradient Descent Finds Global Minima of Deep Neural Networks. Proceedings of the 36th International Conference on Machine Learning, Long Beach, 28 May 2019, 1675-1685.
|
|
[3]
|
Mignacco, F. and Urbani, P. (2022) The Effective Noise of Stochastic Gradient Descent. Journal of Statistical Mechanics: Theory and Experiment, 2022, Article 083405. [Google Scholar] [CrossRef]
|
|
[4]
|
Huang, F., Gao, S., Pei, J., et al. (2022) Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization. Journal of Machine Learning Research, 23, 1616-1685.
|
|
[5]
|
Shani, L., Efroni, Y. and Mannor, S. (2020) Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 5668-5675. [Google Scholar] [CrossRef]
|
|
[6]
|
Krutikov, V., Tovbis, E., Stanimirović, P. and Kazakovtsev, L. (2023) On the Convergence Rate of Quasi-Newton Methods on Strongly Convex Functions with Lipschitz Gradient. Mathematics, 11, Article 4715. [Google Scholar] [CrossRef]
|
|
[7]
|
Glowinski, R. and Marroco, A. (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-Dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse Numérique, 9, 41-76. [Google Scholar] [CrossRef]
|
|
[8]
|
Gabay, D. and Mercier, B. (1976) A Dual Algorithm for the Solution of Nonlinear Variational Problems via Finite Element Approximation. Computers & Mathematics with Applications, 2, 17-40. [Google Scholar] [CrossRef]
|
|
[9]
|
Bertsekas, D.P. (2014) Constrained Optimization and Lagrange Multiplier Methods. Academic Press.
|
|
[10]
|
Jakovetic, D., Bajovic, D., Xavier, J. and Moura, J.M.F. (2020) Primal-Dual Methods for Large-Scale and Distributed Convex Optimization and Data Analytics. Proceedings of the IEEE, 108, 1923-1938. [Google Scholar] [CrossRef]
|
|
[11]
|
Ma, S. (2015) Alternating Proximal Gradient Method for Convex Minimization. Journal of Scientific Computing, 68, 546-572. [Google Scholar] [CrossRef]
|
|
[12]
|
Chen, C., He, B., Ye, Y. and Yuan, X. (2014) The Direct Extension of ADMM for Multi-Block Convex Minimization Problems Is Not Necessarily Convergent. Mathematical Programming, 155, 57-79. [Google Scholar] [CrossRef]
|
|
[13]
|
Lin, T., Ma, S. and Zhang, S. (2017) Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems. Journal of Scientific Computing, 76, 69-88. [Google Scholar] [CrossRef]
|
|
[14]
|
Wang, Y., Yin, W. and Zeng, J. (2018) Global Convergence of ADMM in Nonconvex Nonsmooth Optimization. Journal of Scientific Computing, 78, 29-63. [Google Scholar] [CrossRef]
|
|
[15]
|
Chao, M.T., Zhang, Y. and Jian, J.B. (2020) An Inertial Proximal Alternating Direction Method of Multipliers for Nonconvex Optimization. International Journal of Computer Mathematics, 98, 1199-1217. [Google Scholar] [CrossRef]
|
|
[16]
|
Liavas, A.P. and Sidiropoulos, N.D. (2015) Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers. IEEE Transactions on Signal Processing, 63, 5450-5463. [Google Scholar] [CrossRef]
|
|
[17]
|
Shen, Y., Wen, Z. and Zhang, Y. (2012) Augmented Lagrangian Alternating Direction Method for Matrix Separation Based on Low-Rank Factorization. Optimization Methods and Software, 29, 239-263. [Google Scholar] [CrossRef]
|
|
[18]
|
Mai, V. and Johansson, M. (2020) Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization. Proceedings of the 37th International Conference on Machine Learning, Online, 13-18 July 2020, 6630-6639.
|
|
[19]
|
Emmert-Streib, F. and Dehmer, M. (2019) High-Dimensional Lasso-Based Computational Regression Models: Regularization, Shrinkage, and Selection. Machine Learning and Knowledge Extraction, 1, 359-383. [Google Scholar] [CrossRef]
|
|
[20]
|
Avron, H., Clarkson, K.L. and Woodruff, D.P. (2017) Faster Kernel Ridge Regression Using Sketching and Preconditioning. SIAM Journal on Matrix Analysis and Applications, 38, 1116-1138. [Google Scholar] [CrossRef]
|
|
[21]
|
Zhong, W. and Kwok, J. (2014) Gradient Descent with Proximal Average for Nonconvex and Composite Regularization. Proceedings of the AAAI Conference on Artificial Intelligence, 28, 2206-2212. [Google Scholar] [CrossRef]
|
|
[22]
|
Li, X., Ding, S. and Li, Y. (2017) Outlier Suppression via Non-Convex Robust PCA for Efficient Localization in Wireless Sensor Networks. IEEE Sensors Journal, 17, 7053-7063. [Google Scholar] [CrossRef]
|