|
[1]
|
Lauriola, I., Lavelli, A. and Aiolli, F. (2022) An Introduction to Deep Learning in Natural Language Processing: Models, Techniques, and Tools. Neurocomputing, 470, 443-456. [Google Scholar] [CrossRef]
|
|
[2]
|
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017, 6000-6010.
|
|
[3]
|
Devlin, J., Chang, M.W., Lee, K., et al. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
|
|
[4]
|
Radford, A., Narasimhan, K., Salimans, T., et al. (2018) Improving Language Understanding by Generative Pre-Training.
|
|
[5]
|
Liu, Y., Gu, J., Goyal, N., et al. (2020) Multilingual Denoising Pre-Training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, 726-742. [Google Scholar] [CrossRef]
|
|
[6]
|
Liu, Y., Ott, M., Goyal, N., et al. (2019) Roberta: A Robustly Optimized BERT Pretraining Approach.
|
|
[7]
|
Clark, K., Luong, M.T., Le, Q.V., et al. (2020) Electra: Pre-Training Text Encoders as Discriminators Rather than Generators.
|
|
[8]
|
Cui, Y., Che, W., Liu, T., et al. (2021) Pre-Training with Whole Word Masking for Chinese BERT. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3504-3514. [Google Scholar] [CrossRef]
|
|
[9]
|
Zhang, Z., Han, X., Liu, Z., et al. (2019) ERNIE: Enhanced Language Representation with Informative Entities. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 1441-1451. [Google Scholar] [CrossRef]
|
|
[10]
|
Joshi, M., Chen, D., Liu, Y., et al. (2020) Spanbert: Improving Pre-Training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, 8, 64-77. [Google Scholar] [CrossRef]
|
|
[11]
|
Lan, Z., Chen, M., Goodman, S., et al. (2019) ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations.
|
|
[12]
|
Zafrir, O., Boudoukh, G., Izsak, P., et al. (2019) Q8BERT: Quantized 8Bit BERT. 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), Vancouver, 13 December 2019, 36-39. [Google Scholar] [CrossRef]
|
|
[13]
|
Sanh, V., Debut, L., Chaumond, J., et al. (2019) DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter.
|
|
[14]
|
Jiao, X., Yin, Y., Shang, L., et al. (2019) TinyBERT: Distilling BERT for Natural Language Understanding. Findings of the Association for Computational Linguistics: EMNLP, 16-20 November 2020, 4163-4174. [Google Scholar] [CrossRef]
|
|
[15]
|
Radford, A., Wu, J., Child, R., et al. (2019) Language Models Are Unsupervised Multitask Learners. OpenAI Blog, 1, 9.
|
|
[16]
|
Brown, T., Mann, B., Ryder, N., et al. (2020) Language Models Are Few-Shot Learners. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, 6-12 December 2020, 1877-1901.
|
|
[17]
|
Ouyang, L., Wu, J., Jiang, X., et al. (2022) Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems NeurIPS 2022, New Orleans, 28 November-9 December 2022, 27730-27744.
|
|
[18]
|
Nakano, R., Hilton, J., Balaji, S., et al. (2021) Webgpt: Browser-Assisted Question-Answering with Human Feedback.
|
|
[19]
|
Bubeck, S., Chandrasekaran, V., Eldan, R., et al. (2023) Sparks of Artificial General Intelligence: Early Experiments with GPT-4.
|
|
[20]
|
Schick, T., Dwivedi-Yu, J., Dessì, R., et al. (2023) Toolformer: Language Models Can Teach Themselves to Use Tools.
|
|
[21]
|
Raffel, C., Shazeer, N., Roberts, A., et al. (2020) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. The Journal of Machine Learning Research, 21, 5485-5551.
|
|
[22]
|
Xue, L., Constant, N., Roberts, A., et al. (2020) mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2021, 483-498. [Google Scholar] [CrossRef]
|
|
[23]
|
Ni, J., Ábrego, G.H., Constant, N., et al. (2021) Sentence-T5: Scalable Sentence Encoders from Pre-Trained Text-to-Text Models. Findings of the Association for Computational Linguistics: ACL 2022, Dublin, May 2022, 1864-1874. [Google Scholar] [CrossRef]
|
|
[24]
|
Dai, Z., Yang, Z., Yang, Y., et al. (2019) Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, July 2019, 2978-2988. [Google Scholar] [CrossRef]
|
|
[25]
|
Yang, Z., Dai, Z., Yang, Y., et al. (2019) XLNet: Generalized Autoregressive Pretraining for Language Understanding. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, 8-14 December 2019.
https://papers.nips.cc/paper_files/paper/2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html
|