|
[1]
|
Kasiviswanathan, S.P. and Jin, H. (2016) Efficient Private Empirical Risk Minimization for High-Dimensional Learning. International Conference on Machine Learning, 48, 488-497.
|
|
[2]
|
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) Imagenet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. [Google Scholar] [CrossRef]
|
|
[3]
|
Sutskever, I., Martens, J., Dahl, G., et al. (2013) On the Importance of Initialization and Momentum in Deep Learning. International Conference on Machine Learning, 28, 1139-1147.
|
|
[4]
|
Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef]
|
|
[5]
|
Bottou, L., Curtis, F.E. and Nocedal, J. (2018) Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60, 223-311. [Google Scholar] [CrossRef]
|
|
[6]
|
Johnson, R. and Zhang, T. (2013) Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. Advances in Neural Information Processing Systems, 1, 315-323.
|
|
[7]
|
Lei, L. and Jordan, M. (2017) Less than a Single Pass: Stochastically Controlled Stochastic Gradient. Artificial Intelligence and Statistics, 54, 148-156.
|
|
[8]
|
Lei, L., Ju, C., Chen, J., et al. (2017) Non-Convex Finite-Sum Optimization via SCSG Methods. Advances in Neural Information Processing Systems, 11, 2345-2355.
|
|
[9]
|
Lei, L. and Jordan, M.I. (2020) On the Adaptivity of Stochastic Gradient-Based Optimization. SIAM Journal on Optimization, 30, 1473-1500. [Google Scholar] [CrossRef]
|
|
[10]
|
Gower, R.M., Loizou, N., Qian, X., et al. (2019) SGD: General Analysis and Improved Rates. International Conference on Machine Learning, 97, 5200-5209.
|
|
[11]
|
Ghadimi, S. and Lan, G. (2013) Stochastic First-and Zeroth-Order Methods for Nonconvex Stochastic Programming. SIAM Journal on Optimization, 23, 2341-2368. [Google Scholar] [CrossRef]
|
|
[12]
|
Rakhlin, A., Shamir, O. and Sridharan, K. (2011) Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization. arXiv: 1109.5647.
|
|
[13]
|
Polyak, B.T. (1987) Introduction to Optimization. Optimization Software. Publications Division, New York.
|
|
[14]
|
Loizou, N., Vaswani, S., Laradji, I.H., et al. (2021) Stochastic Polyak Step-Size for SGD: An Adaptive Learning Rate for Fast Convergence. International Conference on Artificial Intelligence and Statistics, 130, 1306-1314.
|
|
[15]
|
Orvieto, A., Lacoste-Julien, S. and Loizou, N. (2022) Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution. Advances in Neural Information Processing Systems, 35, 26943-26954.
|
|
[16]
|
Wang, X. and Yuan, Y. (2023) On the Convergence of Stochastic Gradient Descent with Bandwidth-Based Step Size. Journal of Machine Learning Research, 24, 1-49.
|