|
[1]
|
谢正, 李浩, 宋伊萍, 等. 从AIGC到AIGA, 智能新赛道: 决策大模型[J/OL]. 科学观察, 2024: 1-24. http://kns.cnki.net/kcms/detail/11.5469.N.20240412.1839.002.html, 2024-04-17.
|
|
[2]
|
Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef]
|
|
[3]
|
Roux, N., Schmidt, M. and Bach, F. (2012) A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets. Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe Nevada, 3-6 December 2012, 2663-2671.
|
|
[4]
|
Johnson, R. and Zhang, T. (2013) Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. Advances in Neural Information Processing Systems, 26, 315-323.
|
|
[5]
|
Shalev-Shwartz, S. and Zhang, T. (2013) Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. Journal of Machine Learning Research, 14, 567-599.
|
|
[6]
|
Nguyen, L.M., Liu, J., Scheinberg, K., et al. (2017) SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 2613-2621.
|
|
[7]
|
Konečný, J., Liu, J., Richtárik, P., et al. (2015) Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting. IEEE Journal of Selected Topics in Signal Processing, 10, 242-255. [Google Scholar] [CrossRef]
|
|
[8]
|
Beznosikov, A., Gorbunov, E., Berard, H. and Loizou, N. (2023) Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods. arXiv: 2202.07262.
|
|
[9]
|
王贝伦. 机器学习[M]. 南京: 东南大学出版社, 2021.
|
|
[10]
|
Mignacco, F. and Urbani, P. (2022) The Effective Noise of Stochastic Gradient Descent. Journal of Statistical Mechanics: Theory and Experiment, No. 8, Article ID: 083405. [Google Scholar] [CrossRef]
|
|
[11]
|
Smith, S., Elsen, E. and De, S. (2020) On the Generalization Benefit of Noise in Stochastic Gradient Descent. arXiv: 2006.15081.
|
|
[12]
|
Wojtowytsch, S. (2023) Stochastic Gradient Descent with Noise of Machine Learning Type Part I: Discrete Time Analysis. Journal of Nonlinear Science, 33, Article No. 45. [Google Scholar] [CrossRef]
|
|
[13]
|
Izmailov, P., Podoprikhin, D., Garipov, T., et al. (2018) Averaging Weights Leads to Wider Optima and Better Generalization. arXiv: 1803.05407.
|
|
[14]
|
Jain, P., Kakade, S.M., Kidambi, R., et al. (2018) Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-Batching, Averaging, and Model Misspecification. Journal of Machine Learning Research, 18, 1-42.
|
|
[15]
|
Crowder, S.V. and Hamilton, M.D. (1992) An EWMA for Monitoring a Process Standard Deviation. Journal of Quality Technology, 24, 12-21. [Google Scholar] [CrossRef]
|