|
[1]
|
Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning Internal Representation by Back-Propagation Errors. Nature, 323, 533-536. [Google Scholar] [CrossRef]
|
|
[2]
|
Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef]
|
|
[3]
|
Amari, S. (1967) A Theory of Adaptive Pattern Classifiers. IEEE Transactions on Electronic Computers, 3, 299-307. [Google Scholar] [CrossRef]
|
|
[4]
|
Bottou, L. (1998) Online Algorithms and Stochastic Approximations. In: Saad, D., Ed., Online Learning and Neural Networks, Cambridge University Press, Cambridge. [Google Scholar] [CrossRef]
|
|
[5]
|
Rosenblatt, F. (1958) The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386-408. [Google Scholar] [CrossRef] [PubMed]
|
|
[6]
|
Nemirovski, A.S. and Yudin, D.B. (1978) Cesari Convergence of the Gradient Method of Approximating Saddle Points of Convex-Concave Functions. Doklady Akademii Nauk SSSR, 239, 1056-1059.
|
|
[7]
|
Shalev-Shwartz, S., Singer, Y. and Srebro, N. (2007) Pegasos: Primal Estimated Sub-Gradient Solver for SVM. Proceedings of the 24th International Conference on Machine Learning, New York, 20-24 June 2007, 807-814. [Google Scholar] [CrossRef]
|
|
[8]
|
Nemirovski, A.S., Juditsky, A., Lan, G., et al. (2009) Robust Stochastic Approximation Approach to Stochastic Programming. SIAM Journal on Optimization, 19, 1574-1609. [Google Scholar] [CrossRef]
|
|
[9]
|
Langford, J., Li, L. and Tong, Z. (2009) Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 10, 777-801.
|
|
[10]
|
Hardt, M., Recht, B.H. and Singer, Y. (2016) Train Faster, Generalize Better: Stability of Stochastic Gradient Descent. Proceedings of the 33rd International Conference on Machine Learning, New York, 19-24 June 2016, 1868-1877.
|
|
[11]
|
Kasiviswanathan, S.P. and Jin, H. (2016) Efficient Private Empirical Risk Minimization for High-dimensional Learning. Proceedings of the 33rd International Conference on Machine Learning, New York, 19-24 June 2016, 488-497.
|
|
[12]
|
Park, J. (2022) Representation Learnt by SGD and Adaptive Learning Rules—Conditions That Vary Sparsity and Selectivity in Neural Network. arXiv:2201.11653.
|
|
[13]
|
Krizhevsky, A., Sutskever, I. and Hinton, G. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105.
|
|
[14]
|
Sutskever, I., Martens, J., Dahl, G., et al. (2013) On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the 30th International Conference on Machine Learning, 28, 1139-1147.
|
|
[15]
|
Shamir, O. (2016) Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity. Proceedings of the 33rd International Conference on Machine Learning, New York, 19-24 June 2016, 248-256.
|
|
[16]
|
Shamir, O. (2016) Convergence of Stochastic Gradient Descent for PCA. Proceedings of the 33rd International Conference on Machine Learning, New York, 19-24 June 2016, 257-265.
|
|
[17]
|
Garber, D., Hazan, E., Jin, C., Kakade, S.M., Musco, C., Netrapalli, P., et al. (2016) Faster Eigenvector Computation via Shift-and-Invert Preconditioning. Proceedings of the 33rd International Con-ference on Machine Learning, New York, 19-24 June 2016, 2626-2634.
|
|
[18]
|
Allen-Zhu, Z. and Li, Y. (2017) Doubly Accel-erated Methods for Faster CCA and Generalized Eigendecomposition. Proceedings of the 34rd International Conference on Machine Learning, Sydney, 6-11 August 2017, 98-106.
|
|
[19]
|
李凌云. 基于神经网络随机梯度下降法的手写数字识别方法[J]. 信息与电脑, 2021, 33(17): 74-76.
|
|
[20]
|
史加荣, 王丹, 尚凡华, 张鹤于. 随机梯度下降算法研究进展[J]. 自动化学报, 2021, 47(9): 2103-2119.
|
|
[21]
|
Chen, J., Jin, S. and Lyu, L. (2020) A Consensus-Based Global Optimization Method with Adaptive Momentum Estimation. arXiv: 2012.04827.
|