一种非光滑强凸函数的随机次梯度镜面下降算法
A Stochastic Sub-Gradient Mirror Descent Algorithm for Non-Smooth and Strongly Convex Functions
摘要: 镜面下降法(MD)在机器学习问题中已有些实际应用,针对大规模数据的处理和非光滑损失凸优化问题,本文将迭代平均与随机次梯度镜面下降方法相结合,得到了一种改进的方法,通过对问题域的特殊处理,利用它们的结构,提出一种加权平均的随机次梯度镜面下降算法。在这个加权平均过程中,平均迭代不用于构造算法,而是作为算法的副产品出现,其中平均权重由算法使用的步长确定。该算法有很好的收敛性。对于强凸函数,我们证明了该算法的最佳收敛速度达到
Abstract: Mirror descent (MD) has been widely used to deal with the machine learning problems. For large scale data processing and non-smooth loss convex optimization problem, we proposed an improved mirror descent method, which is called modified stochastic sub-gradient mirror descent method. It combined an iterative average method with stochastic sub-gradient descent method. In the process of weighted average, the average iteration is not used to construct the algorithm, but occurs as a byproduct of our algorithm. The average weight is determined by the step size used by the algorithm. Our algorithm has good convergence. For strong convex functions, we show that the optimal convergence rate of the algorithm arrives at .
文章引用:周倩, 罗贤兵, 王鑫. 一种非光滑强凸函数的随机次梯度镜面下降算法[J]. 理论数学, 2018, 8(3): 221-229. https://doi.org/10.12677/PM.2018.83028

参考文献

[1] Juditsky, A. and Nemirovski, A.S. (2010) First Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem’s Structure. MIT Press, Cambridge.
[2] Ben-Tal, A., Margalit, T. and Nemirovski, A. (2001) The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography. Society for Industrial and Applied Mathematics, 12, 79-108.
[3] Rakhlin, A., Shamir, O. and Sridharan, K. (2012) Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization. arXiv:1109.5647 [cs.LG]
[4] Beck, A. and Teboulle, M. (2003) Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization. Operations Research Letters, 31, 167-175. [Google Scholar] [CrossRef
[5] Duchi, J.C., Shalev-Shwartz, S. and Singer, Y. (2010) Composite Objective Mirror Descent. 23rd Conference on Learning Theory, Haifa, 27-29 June 2010, 14-26.
[6] Sra, S., Nowozin, S. and Wright, S.J. (2011) Optimization for Machine Learning. MIT Press, Cam-bridge.
[7] Brègman, L.M. (1967) The Relaxation Method of Finding the Common Point of Convex Sets and Its Ap-plication to the Solution of Problems in Convex Programming. USSR Computational Mathematics & Mathematical Physics, 7, 200-217. [Google Scholar] [CrossRef
[8] Tseng, P. (2008) On Accelerated Proximal Gradient Methods for Convex-Concave Optimization. SIAM Journal on Optimization.
[9] 陶蔚, 潘志松, 陶卿. 使用Nesterov步长策略投影次梯度方法的个体收敛性[J]. 计算机学报. 2018 (1).
[10] Beck, A. and Te-boulle, M. (2009) A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2, 183-202. [Google Scholar] [CrossRef
[11] Luong, D.V.N., Parpas, P. and Rueckert, D. (2016) A Weighted Mirror Descent Algorithm for Non-Smooth Convex Optimization Problem. Journal of Optimi-zation Theory & Applications, 170, 900-915. [Google Scholar] [CrossRef