|
[1]
|
Chen, T., Chen, X., Chen, W., et al. (2022) Learning to Optimize: A Primer and a Benchmark. Journal of Machine Learning Research, 23, 1-59.
|
|
[2]
|
Andrychowicz, M., Denil, M., Gomez, S., et al. (2016) Learning to Learn by Gradi-ent Descent by Gradient Descent. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, 5-10 December 2016, 3988-3996.
|
|
[3]
|
Gregor, K. and LeCun, Y. (2010) Learning Fast Approxi-mations of Sparse Coding. Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, 21-24 June 2010, 399-406.
|
|
[4]
|
Liu, J., Chen, X., Wang, Z., et al. (2023) Towards Constituting Mathematical Structures for Learning to Optimize. International Conference on Machine Learning. PMLR, Honolulu, 23-29 July 2023, 21426-21449.
|
|
[5]
|
Bengio, Y., Simard, P. and Frasconi, P. (1994) Learning Long-Term Dependencies with Gradient Descent Is Difficult. IEEE Transactions on Neural Networks, 5, 157-166. [Google Scholar] [CrossRef]
|
|
[6]
|
Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef]
|
|
[7]
|
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014) Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, October 2014, 1724-1734. [Google Scholar] [CrossRef]
|
|
[8]
|
Chung, J., Gulcehre, C., Cho, K.H., et al. (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.
|
|
[9]
|
Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R. and Schmidhuber, J. (2017) LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learn-ing Systems, 28, 2222-2232. [Google Scholar] [CrossRef]
|
|
[10]
|
Zhang, S., Yao, L., Sun, A., et al. (2019) Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Computing Surveys (CSUR), 52, 1-38.
|
|
[11]
|
Vaswani, A., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Pro-cessing Systems, Long Beach, 4-9 December 2017, 6000-6010.
|
|
[12]
|
Boyd, S., Parikh, N., Chu, E., et al. (2010) Distrib-uted Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends® in Machine Learning, 3, 1-122. [Google Scholar] [CrossRef]
|
|
[13]
|
Nesterov, Y. (2013) Introductory Lec-tures on Convex Optimization: A Basic Course. Springer Science & Business Media.
|
|
[14]
|
Devlin, J., Chang, M.W., Lee, K., et al. (2019) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 4171-4186.
|
|
[15]
|
Martin, D., Fowlkes, C., Tal, D. and Malik, J. (2001) A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statis-tics. Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vancouver, 7-14 July 2001, 416-423. [Google Scholar] [CrossRef]
|
|
[16]
|
Asuncion, A. and Newman, D. (2007) UCI Machine Learning Re-pository.
|
|
[17]
|
Kingma, D.P. (2014) Adam: A Method for Stochastic Optimization.
|
|
[18]
|
Beck, A. and Teboulle, M. (2009) A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sci-ences, 2, 183-202. [Google Scholar] [CrossRef]
|
|
[19]
|
Lv, K., Jiang, S. and Li, J. (2017) Learning Gradient Descent: Better Generalization and Longer Horizons. International Conference on Machine Learning. PMLR, Sydney, 6-11 August 2017, 2247-2255.
|