|
[1]
|
邱锡鹏. 神经网络与深度学习[M]. 北京: 机械工业出版社, 2020.
|
|
[2]
|
Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159.
|
|
[3]
|
Hinton, G., Srivastava, N. and Swersky, K. (2012) Neural Networks for Machine Learning Lecture 6a: Overview of Mini-Batch Gradient Descent. University of Toronto.
|
|
[4]
|
Kingma, D.P. and Ba, J. (2015) Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, 7-9 May 2015, 1-15.
|
|
[5]
|
Anil, R., Gupta, V., Koren, T., et al. (2021) Scalable Second Order Optimization for Deep Learning.
|
|
[6]
|
Chen, X., Dong, X., Hsieh, C., Huang, D., Le, Q.V., Liang, C., et al. (2023). Symbolic Discovery of Optimization Algorithms. Advances in Neural Information Processing Systems, 36, 49205-49233.[CrossRef]
|
|
[7]
|
Shazeer, N. and Stern, M. (2018) Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, 10-15 July 2018, 4596-4604.
|
|
[8]
|
Liu, L.Y., Jiang, H.M., He, P.C., et al. (2020) On the Variance of the Adaptive Learning Rate and beyond. Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, 26-30 April 2020, 1-13.
|
|
[9]
|
Loshchilov, I. and Hutter, F. (2019) Decoupled Weight Decay Regularization. Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, 6-9 May 2019, 2-15.
|
|
[10]
|
Schmidt, R., Schneider, F. and Hennig, P. (2021) Descending through a Crowded Valley: Benchmarking Deep Learning Optimizers. Proceedings of the 38th International Conference on Machine Learning (ICML), Online, 18-24 July 2021, 10-12.
|
|
[11]
|
Zarghani, A. and Abedi, S. (2025) Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams. 7-9. arXiv preprint arXiv:2507.06901.
|
|
[12]
|
Simsekli, U., Sagun, L. and Gurbuzbalaban, M. (2019) A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks. Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, 9-15 June 2019, 5827-5837.
|
|
[13]
|
Reddi, S.J., Kale, S. and Kumar, S. (2018) On the Convergence of Adam and Beyond. Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, 30 April-3 May 2018.
|
|
[14]
|
Welford, B.P. (1962) Note on a Method for Calculating Corrected Sums of Squares and Products. Technometrics, 4, 419-420. [Google Scholar] [CrossRef]
|
|
[15]
|
Rastrigin, L.A. (1974) Systems of Extremal Control. Nauka, Moscow.
|