|
[1]
|
Hinton, G.E., Osindero, S. and Teh, Y.W. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18, 1527-1554. [Google Scholar] [CrossRef] [PubMed]
|
|
[2]
|
Kiran, B.R., Sobh, I., Talpaert, V., et al. (2021) Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Transactions on Intelligent Transportation Systems, 6, 4909-4926. [Google Scholar] [CrossRef]
|
|
[3]
|
Zou, Z., Shi, Z., Guo, Y., et al. (2019) Object Detection in 20 Years: A Survey. ArXiv: 1905.05055.
|
|
[4]
|
Su, J., Xu, B. and Yin, H. (2022) A Survey of Deep Learning Approaches to Image Restoration. Neurocomputing, 487, 46-65. [Google Scholar] [CrossRef]
|
|
[5]
|
殷琪林, 王金伟. 深度学习在图像处理领域中的应用综述[J]. 高教学刊, 2018(9): 72-74.
|
|
[6]
|
Ruder, S. (2016) An Overview of Gradient Descent Optimization Algorithms. ArXiv: 1609.04747.
|
|
[7]
|
袁群勇. 深度神经网络的训练优化方法研究[D]: [博士学位论文]. 广州: 华南理工大学, 2020.
|
|
[8]
|
张慧. 深度学习中优化算法的研究与改进[D]: [硕士学位论文]. 北京: 北京邮电大学, 2018.
|
|
[9]
|
Cauchy, M.A. (1847) Méthodegénérale pour la résolution des systems d’équations simultanées. Comptes Rendus de l’Académie des Sciences, 25, 536-538.
|
|
[10]
|
Levenberg, K. (1944) A Method for the Solution of Certain Non-Linear Problems in Least Squares. Quarterly of Applied Mathematics, 2, 164-168. [Google Scholar] [CrossRef]
|
|
[11]
|
Marquardt, D.W. (1963) An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics, 11, 431-441. [Google Scholar] [CrossRef]
|
|
[12]
|
Møller, M.F. (1993) Efficient Training of Feed-Forward Neural Networks. Aarhus University, Aarhus. [Google Scholar] [CrossRef]
|
|
[13]
|
Le Roux, N., Bengio, Y. and Fitzgibbon, A. (2011) Improving First and Second-Order Methods by Modeling Uncertainty. In: Sra, S., Nowozin, S. and Wright, S.J., Eds., Optimization for Machine Learning, The MIT Press, Cambridge, 403-429.
|
|
[14]
|
王帅, 向建军, 彭芳, 唐书娟. 基于新最速下降法的目标跟踪算法[J]. 系统工程与电子技术, 2022, 44(5): 1512-1519.
|
|
[15]
|
刘晓, 吴明儿, 张华振. 基于最速下降法的可展开索网天线型面调整方法[J]. 中国空间科学技术, 2018, 38(3): 1-7. [Google Scholar] [CrossRef]
|
|
[16]
|
于忠霞. 基于最速下降法的常州物流业优化问题研究[J]. 常州工学院学报, 2015, 28(3): 45-48.
|
|
[17]
|
米阳, 彭建伟, 陈博洋, 王晓敏, 刘子旭, 王育飞. 基于一致性原理和梯度下降法的微电网完全分布式优化调度[J]. 电力系统保护与控制, 2022, 50(15): 1-10. [Google Scholar] [CrossRef]
|
|
[18]
|
Robbins, H. and Monro, S. (1985) A Stochastic Approximation Method. Springer, New York, 1985. [Google Scholar] [CrossRef]
|
|
[19]
|
Bottou, L. (1998) Online Learning and Stochastic Approximations. In: Saad, D., Ed., On-Line Learning in Neural Networks, Cambridge University Press, Cambridge, 9-42. [Google Scholar] [CrossRef]
|
|
[20]
|
Sutton, R. (1986) Two Problems with Back Propagation and Other Steepest Descent Learning Procedures for Networks. In: Proceedings of the 8th Annual Conference of the Cognitive Science Society, Erlbaum, Hillsdale, 823-832.
|
|
[21]
|
Dauphin, Y., Pascanu, R., Gulcehre, C., et al. (2014) Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization. ArXiv: 1406.2572.
|
|
[22]
|
LeCun, Y., Boser, B., Denker, J.S., et al. (1989) Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1, 541-551. [Google Scholar] [CrossRef]
|
|
[23]
|
Ning, Q. and Qian, N. (1999) On the Momentum Term in Gradient Descent Learning Algorithms. Neural Networks, 12, 145-151. [Google Scholar] [CrossRef]
|
|
[24]
|
LeCun, Y.A., Bottou, L., Orr, G.B. and Müller, K.R. (2012) Efficient BackProp. In: Montavon, G., Orr, G.B. and Müller, KR., Eds., Neural Networks: Tricks of the Trade, Springer, Berlin, 9-48. [Google Scholar] [CrossRef]
|
|
[25]
|
Sutskever, I., Martens, J., Dahl, G. and Hinton, G. (2013) On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the International Conference on Machine Learning Research, 28, 1139-1147.
|
|
[26]
|
Nesterov, Y. (1983) A Method of Solving a Convex Programming Problem with Convergence Rate Mathcal O(1/k^2). Proceedings of the USSR Academy of Sciences, 269, 543-547.
|
|
[27]
|
Nesterov, Y. (2003) Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media, Berlin. [Google Scholar] [CrossRef]
|
|
[28]
|
Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159.
|
|
[29]
|
Jacobs, R.A. (1988) Increased Rates of Convergence through Learning Rate Adaptation. Neural Networks, 1, 295-307. [Google Scholar] [CrossRef]
|
|
[30]
|
Zeiler, M.D. (2012) Adadelta: An Adaptive Learning Rate Method. ArXiv: 1212.5701.
|
|
[31]
|
Tieleman, T. and Hinton, G. (2012) RmsProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning, 4, 26-31.
|
|
[32]
|
Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. ArXiv: 1412.6980.
|
|
[33]
|
Loshchilov, I. and Hutter, F. (2017) Decoupled Weight Decay Regularization. ArXiv: 1711.05101.
|
|
[34]
|
Zaheer, M., Reddi, S., Sachan, D., Kale, S. and Kumar, S. (2018) Adaptive Methods for Nonconvex Optimization. Advances in Neural Information Processing Systems, 31, 9793-9803.
|
|
[35]
|
Reddi, S.J., Kale, S. and Kumar, S. (2019) On the Convergence of Adam and Beyond. ArXiv: 1904.09237.
|
|
[36]
|
Zhuang, J., Tang, T., Ding, Y., et al. (2020) AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients. Advances in Neural Information Processing Systems, 33, 18795-18806.
|
|
[37]
|
Dubey, S.R., Chakraborty, S., Roy, S.K., et al. (2019) DiffGrad: An Optimization Method for Convolutional Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 31, 4500-4511. [Google Scholar] [CrossRef]
|
|
[38]
|
Khan, M.U.S., Jawad, M. and Khan, S.U. (2021) Adadb: Adaptive Diff-Batch Optimization Technique for Gradient Descent. IEEE Access, 9, 99581-99588. [Google Scholar] [CrossRef]
|
|
[39]
|
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M. and Tang, P.T.P. (2016) On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. ArXiv: 1609.04836.
|
|
[40]
|
Keskar, N.S. and Socher, R. (2017) Improving Generalization Performance by Switching from Adam to SGD. ArXiv: 1712.07628.
|
|
[41]
|
Xing, C., Arpit, D., Tsirigotis, C. and Bengio, Y. (2018) A Walk with SGD. ArXiv: 1802.08770.
|
|
[42]
|
Tong, Q., Liang, G. and Bi, J. (2022) Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM. Neurocomputing, 481, 333-356. [Google Scholar] [CrossRef] [PubMed]
|
|
[43]
|
Luo, L., Xiong, Y., Liu, Y. and Sun, X. (2019) Adaptive Gradient Methods with Dynamic Bound of Learning Rate. ArXiv: 1902.09843.
|
|
[44]
|
Ding, J., Ren, X., Luo, R. and Sun, X. (2019) An Adaptive and Momental Bound Method for Stochastic Learning. ArXiv: 1910.12249.
|
|
[45]
|
Liu, L., Jiang, H., He, P., et al. (2019) On the Variance of the Adaptive Learning Rate and Beyond. ArXiv: 1908.03265.
|
|
[46]
|
Gotmare, A., Keskar, N.S., Xiong, C. and Socher, R. (2018) A Closer Look at Deep Learning Heuristics: Learning Rate restarts, Warmup and Distillation. ArXiv: 1810.13243.
|
|
[47]
|
Zhang, M., Lucas, J., Ba, J. and Hinton, G.E. (2019) Lookahead Optimizer: K Steps Forward, 1 Step Back. Advances in Neural Information Processing Systems, 32, 9597-9608.
|