基于Transformer的自然语言处理模型综述

doi:10.12677/AIRR.2023.123025

期刊菜单

基于Transformer的自然语言处理模型综述
A Survey of Transformer-Based Natural Language Processing Models

DOI: 10.12677/AIRR.2023.123025, PDF,
作者: 赖鸣姝：北京印刷学院，信息工程学院，北京
关键词: 人工智能；深度学习；自然语言处理；Artificial Intelligence； Deep Learning； Natural Language Processing

摘要: 自然语言处理是计算机科学中深度学习领域的一个分支，旨在使计算机能够理解、解析或生成人类语言(包括文字、音频等)。本文主要介绍了自然语言处理(Natural Language Processing, NLP)中基于Transformer结构所衍生出的多种类型的模型。近年，随着深度学习技术的快速发展，自然语言处理模型的性能也得到了极大的提升，更多的自然语言处理任务得到了更好的解决。这些进展主要得益于神经网络模型的不断发展。本文讲解了当前最为流行的基于Transformer的几类自然语言处理模型，包括BERT (Bidirectional Encoder Representations from Transformers)系列、GPT (Generative Pre-trained Transformer)系列和T5系列等。主要介绍了上述系列的模型各自的发展变化以及其在模型结构，设计思路等方面的区别与联系。同时，对于自然语言处理领域未来的发展方向进行了展望。

Abstract: Natural language processing is a subfield of deep learning in computer science that aims to enable computers to understand, parse, or generate human language (text, audio, etc.). This paper mainly introduces various types of models derived from the Transformer structure in Natural Language Processing (NLP). In recent years, with the rapid development of deep learning technology, the performance of natural language processing models has also been greatly improved, and more natural language processing tasks have been better solved. These advances are mainly due to the continuous development of neural network models. This article explains the most popular Transformer-based natural language processing models. These include BERT (Bidirectional Encoder Representations from Transformers) family, GPT (Generative Pre-trained Transformer) family, the T5 family, etc. This paper mainly introduces the development and changes of the above series of models, as well as their differences and connections in model structure, design ideas and other aspects. At the same time, the future development direction of natural language processing is prospected.

文章引用：赖鸣姝. 基于Transformer的自然语言处理模型综述[J]. 人工智能与机器人研究, 2023, 12(3): 219-225. https://doi.org/10.12677/AIRR.2023.123025

参考文献

[1]	Lauriola, I., Lavelli, A. and Aiolli, F. (2022) An Introduction to Deep Learning in Natural Language Processing: Models, Techniques, and Tools. Neurocomputing, 470, 443-456. [Google Scholar] [CrossRef]
[2]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017, 6000-6010.
[3]	Devlin, J., Chang, M.W., Lee, K., et al. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
[4]	Radford, A., Narasimhan, K., Salimans, T., et al. (2018) Improving Language Understanding by Generative Pre-Training.
[5]	Liu, Y., Gu, J., Goyal, N., et al. (2020) Multilingual Denoising Pre-Training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, 726-742. [Google Scholar] [CrossRef]
[6]	Liu, Y., Ott, M., Goyal, N., et al. (2019) Roberta: A Robustly Optimized BERT Pretraining Approach.
[7]	Clark, K., Luong, M.T., Le, Q.V., et al. (2020) Electra: Pre-Training Text Encoders as Discriminators Rather than Generators.
[8]	Cui, Y., Che, W., Liu, T., et al. (2021) Pre-Training with Whole Word Masking for Chinese BERT. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3504-3514. [Google Scholar] [CrossRef]
[9]	Zhang, Z., Han, X., Liu, Z., et al. (2019) ERNIE: Enhanced Language Representation with Informative Entities. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 1441-1451. [Google Scholar] [CrossRef]
[10]	Joshi, M., Chen, D., Liu, Y., et al. (2020) Spanbert: Improving Pre-Training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, 8, 64-77. [Google Scholar] [CrossRef]
[11]	Lan, Z., Chen, M., Goodman, S., et al. (2019) ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations.
[12]	Zafrir, O., Boudoukh, G., Izsak, P., et al. (2019) Q8BERT: Quantized 8Bit BERT. 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), Vancouver, 13 December 2019, 36-39. [Google Scholar] [CrossRef]
[13]	Sanh, V., Debut, L., Chaumond, J., et al. (2019) DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter.
[14]	Jiao, X., Yin, Y., Shang, L., et al. (2019) TinyBERT: Distilling BERT for Natural Language Understanding. Findings of the Association for Computational Linguistics: EMNLP, 16-20 November 2020, 4163-4174. [Google Scholar] [CrossRef]
[15]	Radford, A., Wu, J., Child, R., et al. (2019) Language Models Are Unsupervised Multitask Learners. OpenAI Blog, 1, 9.
[16]	Brown, T., Mann, B., Ryder, N., et al. (2020) Language Models Are Few-Shot Learners. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, 6-12 December 2020, 1877-1901.
[17]	Ouyang, L., Wu, J., Jiang, X., et al. (2022) Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems NeurIPS 2022, New Orleans, 28 November-9 December 2022, 27730-27744.
[18]	Nakano, R., Hilton, J., Balaji, S., et al. (2021) Webgpt: Browser-Assisted Question-Answering with Human Feedback.
[19]	Bubeck, S., Chandrasekaran, V., Eldan, R., et al. (2023) Sparks of Artificial General Intelligence: Early Experiments with GPT-4.
[20]	Schick, T., Dwivedi-Yu, J., Dessì, R., et al. (2023) Toolformer: Language Models Can Teach Themselves to Use Tools.
[21]	Raffel, C., Shazeer, N., Roberts, A., et al. (2020) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. The Journal of Machine Learning Research, 21, 5485-5551.
[22]	Xue, L., Constant, N., Roberts, A., et al. (2020) mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2021, 483-498. [Google Scholar] [CrossRef]
[23]	Ni, J., Ábrego, G.H., Constant, N., et al. (2021) Sentence-T5: Scalable Sentence Encoders from Pre-Trained Text-to-Text Models. Findings of the Association for Computational Linguistics: ACL 2022, Dublin, May 2022, 1864-1874. [Google Scholar] [CrossRef]
[24]	Dai, Z., Yang, Z., Yang, Y., et al. (2019) Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, July 2019, 2978-2988. [Google Scholar] [CrossRef]
[25]	Yang, Z., Dai, Z., Yang, Y., et al. (2019) XLNet: Generalized Autoregressive Pretraining for Language Understanding. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, 8-14 December 2019. https://papers.nips.cc/paper_files/paper/2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html

为你推荐

友情链接