|
[1]
|
Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780.
| [Google Scholar] [CrossRef] [PubMed]
|
|
[2]
|
Socher, R., Lin, C.C., Ng, A.Y. and Manning, C. (2011) Parsing Natural Scenes and Natural Language with Recursive Neural Networks. Proceedings of the 28th International Confer-ence on Machine Learning (ICML-11), Bellevue, WA, 28 June-2 July 2011, 129-136.
|
|
[3]
|
Glorot, X., Bordes, A. and Bengio, Y. (2012) Deep Sparse Rectifier Neural Networks. International Conference on Artificial Intelligence and Sta-tistics, 15, 315-323.
|
|
[4]
|
Le, Q.V., Jaitly, N. and Hinton, G.E. (2015) A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. Computer Science.
|
|
[5]
|
Talathi, S.S. and Vartak, A. (2015) Improving Performance of Re-current Neural Network with Relu Nonlinearity. Computer Science.
|
|
[6]
|
Poljak, B.T. (1964) Some Methods of Speeding up the Convergence of Iterative Methods. USSR Computational Mathematics & Mathematical Physics, 4, 1-17. [Google Scholar] [CrossRef]
|
|
[7]
|
Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 257-269.
|
|
[8]
|
Martens, J. (2010) Deep Learning via Hessian-Free Optimization. International Conference on Machine Learning, Haifa, Israel, 21-24 June 2010, 735-742.
|
|
[9]
|
Nesterov, Y. (2004) Introductory Lectures on Convex Opti-mization. Applied Optimization, 87, xviii, 236.
|
|
[10]
|
Meng, X., Bradley, J., Yavuz, B., et al. (2015) MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research, 17, 1235-1241.
|
|
[11]
|
Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef]
|
|
[12]
|
Hinton, G.E. and Salakhutdinov, R.R. (2006) Reducing the Di-mensionality of Data with Neural Networks. Science, 313, 504-507. [Google Scholar] [CrossRef] [PubMed]
|
|
[13]
|
Hinton, G., Deng, L., Yu, D., et al. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29, 82-97. [Google Scholar] [CrossRef]
|
|
[14]
|
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. International Conference on Neural Information Processing Systems, 60, 1097-1105.
|
|
[15]
|
Deng, L., Li, J., Huang, J.T., et al. (2013) Recent Advances in Deep Learning for Speech Research at Microsoft. IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, 26-31 May 2013, 8604-8608.
|
|
[16]
|
Graves, A. (2014) Generating Sequences with Recurrent Neural Networks.
|
|
[17]
|
Nesterov, Y. (1983) A Method of Solving a Convex Programming Problem with Convergence Rate O(1/K2). Soviet Mathematics Doklady, 27, 372-376.
|
|
[18]
|
Tieleman, T. and Hinton, G. (2012) Lecture 6.5—RMSProp, COURSERA: Neural Networks for Machine Learning. Technical Report.
|
|
[19]
|
Graves, A., Mohamed, A.R. and Hinton, G. (2013) Speech Recognition with Deep Recurrent Neural Networks. IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, 26-30 May 2013, 6645-6649.
|
|
[20]
|
Zeiler, M.D. (2012) ADADELTA: An Adaptive Learning Rate Method.
|
|
[21]
|
Kingma, D. and Ba, J. (2014) Adam: A Method for Stochastic Optimiza-tion.
|
|
[22]
|
Timothy, D. (2016) Incorporating Nesterov Momentum into Adam.
|
|
[23]
|
Wei, W.G.H., Liu, T., Song, A., et al. (2018) An Adaptive Natural Gradient Method with Adaptive Step Size in Multilayer Perceptrons. Chinese Auto-mation Congress, 1593-1597.
|
|
[24]
|
Loshchilov, I. and Hutter, F. (2016) SGDR: Stochastic Gradient Descent with Warm Restarts.
|