一种关于有效步长约束的自适应算法
An Adaptive Algorithm on Effective Step Size Constraints
DOI: 10.12677/AAM.2023.1210418, PDF,    国家自然科学基金支持
作者: 姜文翰, 姜志侠*:长春理工大学,数学与统计学院,吉林 长春;刘曜齐:吉林大学,汽车工程学院,吉林 长春
关键词: MAXGrad算法自适应算法收敛性能机器学习MAXGrad Algorithm Adaptive Algorithm Convergence Performance Machine Learning
摘要: 鉴于Adam算法在迭代后期因有效步长过大而导致算法的收敛性能下降,本研究提出了一种名为MAXGrad的优化算法。MAXGrad通过修改二阶矩的迭代公式以限制有效步长的增长。为深入评估MAXGrad算法的实际应用和性能,本文扩展了实验范围,采用三个较大规模的数据集,并与SGDM、Adam以及AMSGrad等算法进行了详细比较。实验结果清晰表明,在多个数据集上,MAXGrad算法相对于Adam和AMSGrad等自适应算法均取得了显著的性能改进。这些结果充分验证了MAXGrad算法作为一种全新的有效步长迭代算法的可行性和卓越性能。
Abstract: In view of the fact that the convergence performance of Adam’s algorithm is degraded at the later stage of iteration due to the excessively large effective step size, an optimization algorithm named MAXGrad is proposed in this study. MAXGrad limits the growth of the effective step size by modify-ing the iterative formulation of the second-order moments. In order to evaluate the practical appli-cation and performance of the MAXGrad algorithm in depth, this paper extends the experimental scope by using three larger-scale datasets and compares them in detail with the algorithms of SGDM, Adam, and AMSGrad. The experimental results clearly show that the MAXGrad algorithm achieves significant performance improvements over adaptive algorithms such as Adam and AMSGrad on multiple datasets. These results fully validate the feasibility and superior performance of the MAXGrad algorithm as a new effective step-size iterative algorithm.
文章引用:姜文翰, 刘曜齐, 姜志侠. 一种关于有效步长约束的自适应算法[J]. 应用数学进展, 2023, 12(10): 4248-4254. https://doi.org/10.12677/AAM.2023.1210418

参考文献

[1] 王斌, 罗莉, 刘金沧, 黄小川, 雷雳. 一种稀疏降噪自编码神经网络影像变化检测方法[J]. 测绘与空间地理信息, 2022, 45(1): 40-44.
[2] 曹中森. 基于卷积神经网络图像融合算法的电力巡检系统研究[J]. 安阳师范学院学报, 2022(5): 29-32.
[3] Liu, Z., Tian, Y. and Wang, Z. (2017) Improving Human Action Recognitionby Temporal Atten-tion. 2017 IEEE International Conference on Image Processing, Beijing, 17-20 September 2017, 870-874. [Google Scholar] [CrossRef
[4] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. and Le, Q.V. (2019) XLNet: Generalized Autoregressive Pretraining for Language Understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, December 2019, 5753-5763.
[5] Xu, Y., Verma, D., Sheridan, R.P., et al. (2020) Deep Dive into Machine Learning Models for Protein Engineering. Journal of Chemical Information and Modeling, 60, 2773-2790. [Google Scholar] [CrossRef] [PubMed]
[6] Hu, L., Fu, C., Ren, Z., et al. (2023) SSELM-neg: Spherical Search-Based Extreme Learning Machine for Drug-Target Interaction Prediction. BMC Bioinformatics, 24, 1471-2105. [Google Scholar] [CrossRef] [PubMed]
[7] 史加荣, 王丹, 尚凡华. 随机梯度下降类算法研究进展[J]. 自动化学报, 2021, 47(9): 2103-2119.
[8] Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159.
[9] Kingma, D.P. and Ba, J. (2014) Adam: A method for Stochastic Optimization. [Google Scholar] [CrossRef
[10] Reddi, S.J., Kale, S. and Kumar, S. (2018) On the Convergence of Adam and Beyond.
https://openreview.net/forum?id=ryQu7f-RZ
[11] Zeng, K., Liu, J., Jiang, Z., et al. (2022) A Decreasing Scaling Transition Scheme from Adam to SGD. Advanced Theory and Simulations, 5, 1-15. [Google Scholar] [CrossRef
[12] Luo, L., Xiong, Y., Liu, Y. and Sun, X. (2018) Adaptive Gradient Methods with Dynamic Bound of Learning Rate.
https://openreview.net/forum?id=Bkg3g2R9FX
[13] Loshchilov, I. and Hutter, F. (2018) Decoupled Weight De-cay Regularization.
https://openreview.net/forum?id=Bkg6RiCqY7
[14] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J. and Han, J. (2019) On the Variance of the Adaptive Learning Rate and Beyond.
https://openreview.net/forum?id=rkgz2aEKDr