[1]
|
Kawaguchi, K. and Lu, H.H. (2020) Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 108.
|
[2]
|
Shalev-Shwartz, S. and Ben-David, S. (2014) References. In: Understanding Machine Learning, Cambridge University Press, Cambridge, 385-394. https://doi.org/10.1017/CBO9781107298019.036
|
[3]
|
Taheri, H., Pedarsani, R. and Thrampoulidis, C. (2021) Fundamental Limits of Ridge- Regularized Empirical Risk Minimization in High Dimensions. Proceedings of the 24th In- ternational Conference on Artificial Intelligence and Statistics, 130.
|
[4]
|
Shalev-Shwartz, S. and Srebro, N. (2008) SVM Optimization: Inverse Dependence on Training Set Size. Proceedings of the 25th International Conference on Machine Learning, Helsinki, 5-9 June 2008. https://doi.org/10.1145/1390156.1390273
|
[5]
|
Bottou, L. (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. In: Lechevallier, Y. and Saporta, G., Eds., Proceedings of COMPSTAT’2010, Physica-Verlag HD, 177-186. https://doi.org/10.1007/978-3-7908-2604-3 16
|
[6]
|
Bottou, L., Curtis, F.E. and Nocedal, J. (2018) Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60, 223-311. https://doi.org/10.1137/16M1080173
|
[7]
|
Mokhtari, A. and Ribeiro, A. (2013) A Dual Stochastic DFP Algorithm for Optimal Resource Allocation in Wireless Systems. 2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Darmstadt, 16-19 June 2013, 21-25. https://doi.org/10.1109/SPAWC.2013.6612004
|
[8]
|
Couillard, O. (2020) Fast and Flexible Optimization of Power Allocation in Wireless Commu- nication Systems Using Neural Networks. McGill University, Montreal, Canada.
|
[9]
|
Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Math- ematical Statistics, 22, 400-407. https://doi.org/10.1214/aoms/1177729586
|
[10]
|
Le Roux, N., Schmidt, M. and Bach, F. (2012) A Stochastic Gradient Method with an Expo- nential Convergence Rate for Finite Training Sets. arXiv preprint arXiv:1202.6258
|
[11]
|
Schmidt, M., Le Roux, N. and Bach, F. (2017) Minimizing finite Sums with the Stochastic Average Gradient. Mathematical Programming, 162, 83-112. https://doi.org/10.1007/s10107-016-1030-6
|
[12]
|
Defazio, A., Bach, F. and Lacoste-Julien, S. (2014) SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives. In: Advances in Neural Information Processing Systems 27 (NIPS 2014).
|
[13]
|
Kulunchakov, A. (2020) Stochastic Optimization for Large-Scale Machine Learning: Variance Reduction and Acceleration. Grenoble Alpes University, France.
|
[14]
|
Wang, C., et al. (2013) Variance Reduction for Stochastic Gradient Optimization. In: Advances in Neural Information Processing Systems 26 (NIPS 2013).
|
[15]
|
Shen, Z., et al. (2016) Adaptive Variance Reducing for Stochastic Gradient Descent. Proceed- ings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, July 2016, 1990-1996.
|
[16]
|
Koneˇcny´, J. and Richt´arik, P. (2017) Semi-Stochastic Gradient Descent Methods. Frontiers in Applied Mathematics and Statistics, 3, Article 9. https://doi.org/10.3389/fams.2017.00009
|
[17]
|
Shang, F., et al. (2021) Efficient Asynchronous Semi-Stochastic Block Coordinate Descent Methods for Large-Scale SVD. IEEE Access, 9, 126159-126171. https://doi.org/10.1109/ACCESS.2021.3094282
|
[18]
|
Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159.
|
[19]
|
Tieleman, T. and Hinton, G. (2012) Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning, 4, 26-31.
|
[20]
|
Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980
|
[21]
|
Mokhtari, A. and Ribeiro, A. (2013) Regularized Stochastic BFGS Algorithm. 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, 3-5 December 2013, 1109-1112. https://doi.org/10.1109/GlobalSIP.2013.6737088
|
[22]
|
Byrd, R.H., et al. (2016) A Stochastic Quasi-Newton Method for Large-Scale Optimization. SIAM Journal on Optimization, 26, 1008-1031. https://doi.org/10.1137/140954362
|
[23]
|
Liu, D.C. and Nocedal, J. (1989) On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming, 45, 503-528. https://doi.org/10.1007/BF01589116
|
[24]
|
Moritz, P., Nishihara, R. and Jordan, M. (2016) A Linearly-Convergent Stochastic L-BFGS Algorithm. Proceedings of the 19th International Conference on Artificial Intelligence and S- tatistics (AISTATS), 41.
|
[25]
|
Gower, R., Goldfarb, D. and Richt´arik, P. (2016) Stochastic Block BFGS: Squeezing More Curvature Out of Data. International Conference on Machine Learning, New York, June 2016.
|
[26]
|
Fletcher, R. and Reeves, C.M. (1964) Function Minimization by Conjugate Gradients. The Computer Journal, 7, 149-154. https://doi.org/10.1093/comjnl/7.2.149
|
[27]
|
Andrei, N. (2013) On Three-Term Conjugate Gradient Algorithms for Unconstrained Opti- mization. Applied Mathematics and Computation, 219, 6316-6327. https://doi.org/10.1016/j.amc.2012.11.097
|
[28]
|
Yao, S.W., et al. (2020) A Class of Globally Convergent Three-Term Dai-Liao Conjugate Gradient Methods. Applied Numerical Mathematics, 151, 354-366. https://doi.org/10.1016/j.apnum.2019.12.026
|
[29]
|
Dai, Y.-H. and Liao, L.-Z. (2001) New Conjugacy Conditions and Related Nonlinear Conjugate Gradient Methods. Applied Mathematics and Optimization, 43, 87-101. https://doi.org/10.1007/s002450010019
|
[30]
|
Babaie-Kafaki, S. and Ghanbari, R. (2014) A Descent Family of Dai-Liao Conjugate Gradient Methods. Optimization Methods and Software, 29, 583-591. https://doi.org/10.1080/10556788.2013.833199
|
[31]
|
Andrei, N. (2015) A New Three-Term Conjugate Gradient Algorithm for Unconstrained Op- timization. Numerical Algorithms, 68, 305-321. https://doi.org/10.1007/s11075-014-9845-9
|
[32]
|
Yao, S.W., et al. (2020) A Class of Globally Convergent Three-Term Dai-Liao Conjugate Gradient Methods. Applied Numerical Mathematics, 151, 354-366. https://doi.org/10.1016/j.apnum.2019.12.026
|
[33]
|
Powell, M.J.D. (1977) Restart Procedures for the Conjugate Gradient Method. Mathematical Programming, 12, 241-254. https://doi.org/10.1007/BF01593790
|
[34]
|
Jiang, X.z., et al. (2021) An Improved Polak-Ribi`ere-Polyak Conjugate Gradient Method with an Efficient Restart Direction. Computational and Applied Mathematics, 40, Article No. 174. https://doi.org/10.1007/s40314-021-01557-9
|
[35]
|
Zoutendijk, G. (1966) Nonlinear Programming: A Numerical Survey. SIAM Journal on Con- trol, 4, 194-210. https://doi.org/10.1137/0304019
|