求解无约束优化问题的随机三项共轭梯度法
A Stochastic Three-Term Conjugate Gradient Method for Unconstrained Optimization Problems
DOI: 10.12677/AAM.2022.117452, PDF, HTML, 下载: 193  浏览: 359  国家自然科学基金支持
作者: 刘 蕾, 薛 丹:青岛大学,数学与统计学院,山东 青岛
关键词: 随机近似经验风险最小化三项共轭梯度机器学习方差缩减Stochastic Approximation Empirical Risk Minimization Three-Term Conjugate Gradient Machine Learning Variance Reduction
摘要: 为了求解无约束随机优化问题,我们提出了一种带方差缩减的随机三项共轭梯度法(STCGVR), 此方法可以用来解决非凸随机问题。 在算法的每次内循环迭代开始时,三项共轭梯度方向以最速 下降方向重新开始迭代,有效地提高了收敛速度。 在适当的条件下,讨论了该算法的性质和收敛 性。 数值结果表明,我们的方法对于求解机器学习问题具有巨大的潜力。
Abstract: To solve unconstrained stochastic optimization problems, a stochastic three-term con- jugate gradient method with variance reduction (STCGVR) is proposed, which can be used to solve nonconvex stochastic problems. At the beginning of each inner loop iteration, the three conjugate gradient directions restart the iteration in the steepest descent direction, which effectively improves the convergence speed. The properties and convergence of the algorithm are discussed under appropriate conditions. The numerical results demonstrate that our method has dramatical potential for machine learning problems.
文章引用:刘蕾, 薛丹. 求解无约束优化问题的随机三项共轭梯度法[J]. 应用数学进展, 2022, 11(7): 4248-4267. https://doi.org/10.12677/AAM.2022.117452

参考文献

[1] Kawaguchi, K. and Lu, H.H. (2020) Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 108.
[2] Shalev-Shwartz, S. and Ben-David, S. (2014) References. In: Understanding Machine Learning, Cambridge University Press, Cambridge, 385-394.
https://doi.org/10.1017/CBO9781107298019.036
[3] Taheri, H., Pedarsani, R. and Thrampoulidis, C. (2021) Fundamental Limits of Ridge- Regularized Empirical Risk Minimization in High Dimensions. Proceedings of the 24th In- ternational Conference on Artificial Intelligence and Statistics, 130.
[4] Shalev-Shwartz, S. and Srebro, N. (2008) SVM Optimization: Inverse Dependence on Training Set Size. Proceedings of the 25th International Conference on Machine Learning, Helsinki, 5-9 June 2008.
https://doi.org/10.1145/1390156.1390273
[5] Bottou, L. (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. In: Lechevallier, Y. and Saporta, G., Eds., Proceedings of COMPSTAT’2010, Physica-Verlag HD, 177-186.
https://doi.org/10.1007/978-3-7908-2604-3 16
[6] Bottou, L., Curtis, F.E. and Nocedal, J. (2018) Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60, 223-311.
https://doi.org/10.1137/16M1080173
[7] Mokhtari, A. and Ribeiro, A. (2013) A Dual Stochastic DFP Algorithm for Optimal Resource Allocation in Wireless Systems. 2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Darmstadt, 16-19 June 2013, 21-25.
https://doi.org/10.1109/SPAWC.2013.6612004
[8] Couillard, O. (2020) Fast and Flexible Optimization of Power Allocation in Wireless Commu- nication Systems Using Neural Networks. McGill University, Montreal, Canada.
[9] Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Math- ematical Statistics, 22, 400-407.
https://doi.org/10.1214/aoms/1177729586
[10] Le Roux, N., Schmidt, M. and Bach, F. (2012) A Stochastic Gradient Method with an Expo- nential Convergence Rate for Finite Training Sets. arXiv preprint arXiv:1202.6258
[11] Schmidt, M., Le Roux, N. and Bach, F. (2017) Minimizing finite Sums with the Stochastic Average Gradient. Mathematical Programming, 162, 83-112.
https://doi.org/10.1007/s10107-016-1030-6
[12] Defazio, A., Bach, F. and Lacoste-Julien, S. (2014) SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives. In: Advances in Neural Information Processing Systems 27 (NIPS 2014).
[13] Kulunchakov, A. (2020) Stochastic Optimization for Large-Scale Machine Learning: Variance Reduction and Acceleration. Grenoble Alpes University, France.
[14] Wang, C., et al. (2013) Variance Reduction for Stochastic Gradient Optimization. In: Advances in Neural Information Processing Systems 26 (NIPS 2013).
[15] Shen, Z., et al. (2016) Adaptive Variance Reducing for Stochastic Gradient Descent. Proceed- ings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, July 2016, 1990-1996.
[16] Koneˇcny´, J. and Richt´arik, P. (2017) Semi-Stochastic Gradient Descent Methods. Frontiers in Applied Mathematics and Statistics, 3, Article 9.
https://doi.org/10.3389/fams.2017.00009
[17] Shang, F., et al. (2021) Efficient Asynchronous Semi-Stochastic Block Coordinate Descent Methods for Large-Scale SVD. IEEE Access, 9, 126159-126171.
https://doi.org/10.1109/ACCESS.2021.3094282
[18] Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159.
[19] Tieleman, T. and Hinton, G. (2012) Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning, 4, 26-31.
[20] Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980
[21] Mokhtari, A. and Ribeiro, A. (2013) Regularized Stochastic BFGS Algorithm. 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, 3-5 December 2013, 1109-1112.
https://doi.org/10.1109/GlobalSIP.2013.6737088
[22] Byrd, R.H., et al. (2016) A Stochastic Quasi-Newton Method for Large-Scale Optimization. SIAM Journal on Optimization, 26, 1008-1031.
https://doi.org/10.1137/140954362
[23] Liu, D.C. and Nocedal, J. (1989) On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming, 45, 503-528.
https://doi.org/10.1007/BF01589116
[24] Moritz, P., Nishihara, R. and Jordan, M. (2016) A Linearly-Convergent Stochastic L-BFGS Algorithm. Proceedings of the 19th International Conference on Artificial Intelligence and S- tatistics (AISTATS), 41.
[25] Gower, R., Goldfarb, D. and Richt´arik, P. (2016) Stochastic Block BFGS: Squeezing More Curvature Out of Data. International Conference on Machine Learning, New York, June 2016.
[26] Fletcher, R. and Reeves, C.M. (1964) Function Minimization by Conjugate Gradients. The Computer Journal, 7, 149-154.
https://doi.org/10.1093/comjnl/7.2.149
[27] Andrei, N. (2013) On Three-Term Conjugate Gradient Algorithms for Unconstrained Opti- mization. Applied Mathematics and Computation, 219, 6316-6327.
https://doi.org/10.1016/j.amc.2012.11.097
[28] Yao, S.W., et al. (2020) A Class of Globally Convergent Three-Term Dai-Liao Conjugate Gradient Methods. Applied Numerical Mathematics, 151, 354-366.
https://doi.org/10.1016/j.apnum.2019.12.026
[29] Dai, Y.-H. and Liao, L.-Z. (2001) New Conjugacy Conditions and Related Nonlinear Conjugate Gradient Methods. Applied Mathematics and Optimization, 43, 87-101.
https://doi.org/10.1007/s002450010019
[30] Babaie-Kafaki, S. and Ghanbari, R. (2014) A Descent Family of Dai-Liao Conjugate Gradient Methods. Optimization Methods and Software, 29, 583-591.
https://doi.org/10.1080/10556788.2013.833199
[31] Andrei, N. (2015) A New Three-Term Conjugate Gradient Algorithm for Unconstrained Op- timization. Numerical Algorithms, 68, 305-321.
https://doi.org/10.1007/s11075-014-9845-9
[32] Yao, S.W., et al. (2020) A Class of Globally Convergent Three-Term Dai-Liao Conjugate Gradient Methods. Applied Numerical Mathematics, 151, 354-366.
https://doi.org/10.1016/j.apnum.2019.12.026
[33] Powell, M.J.D. (1977) Restart Procedures for the Conjugate Gradient Method. Mathematical Programming, 12, 241-254.
https://doi.org/10.1007/BF01593790
[34] Jiang, X.z., et al. (2021) An Improved Polak-Ribi`ere-Polyak Conjugate Gradient Method with an Efficient Restart Direction. Computational and Applied Mathematics, 40, Article No. 174.
https://doi.org/10.1007/s40314-021-01557-9
[35] Zoutendijk, G. (1966) Nonlinear Programming: A Numerical Survey. SIAM Journal on Con- trol, 4, 194-210.
https://doi.org/10.1137/0304019