|
[1]
|
Ashish, V., Noam, S., Niki, P., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, 4-9 December 2017, 5999-6009.
|
|
[2]
|
Yenduri, G., Ramalingam, M., Selvi, G.C., Supriya, Y., Srivastava, G., Maddikunta, P.K.R., et al. (2024) GPT (Generative Pre-Trained Transformer)—A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. IEEE Access, 12, 54608-54649. [Google Scholar] [CrossRef]
|
|
[3]
|
Devlin, J., Chang, M.W., Kenton, L., et al. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186.
|
|
[4]
|
Wojciech, Z., Sutskever, I. and Vinyals, O. (2014) Recurrent Neural Network Regularization. Cornell University, arxiv.
|
|
[5]
|
Bakke, V.C., Kjærran, A. and Stray, B.E. (2021) Long Short-Term Memory RNN. Cornell University, arxiv.
|
|
[6]
|
Fardin, S., Riccardo, D.S. and Sinervo, P.K. (2019) Bidirectional Long Short-Term Memory (BLSTM) Neural Networks for Reconstruction of Top-Quark Pair Decay Kinematics. Cornell University, arxiv.
|
|
[7]
|
Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016) Squad: 100,000+ Questions for Machine Comprehension of Text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 1-5 November 2016, 2383-2392. [Google Scholar] [CrossRef]
|
|
[8]
|
Yu, A.W., Dohan, D., Luong, M.T., et al. (2018) QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. International Conference on Learning Representations. arXiv preprint arXiv:1804.09541.
|
|
[9]
|
Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. [Google Scholar] [CrossRef]
|