A few years ago, when I started working in applying machine learning to text, models could not come close to human performance. Achieving this seemed a remote possibility at the time. Today, that remote possibility is much closer to reality with the arrival of new language models such as GPT-3.
GPT-3 is the latest iteration of deep learning models created by OpenAI. With just a brief prompt from a user, GPT-3 can generate text that’s impressively human-like. It can perform intelligent tasks like writing code based on a short description and answering questions about a passage.
The Transformative Underpinnings
What are the underpinnings of this major milestone in artificial intelligence? GPT-3, as well as its predecessors GPT-2 and GPT, and the BERT family of models are powered by two innovations: Transformer-based architecture and unsupervised pre-training.
Transformer-based neural networks vastly improve the efficiency of how models capture complex relationships between words. The boost in efficiency enables deeper neural networks that could do a much better job of “learning” how language works. First used to improve machine translation (a task at which AI is now achieving close to human accuracy), the Transformer architecture is the foundation for the revolutionary, powerful text models we are seeing today.
Unsupervised pre-training, the second innovation underpinning cutting-edge AI, provides models with an out-of-the-box “understanding” of language. In the pre-training process, BERT and GPT family models “learn” how language works by picking up statistical patterns from massive amounts of training text without any human supervision. This enables them to learn custom language tasks with considerably less training data. Because of unsupervised pre-training, GPT-3 can learn how to write code for website interfaces with only a handful of examples.
Thanks to these dual innovations, Transformer-based models for language are now the central focus in AI research and a major lever in AI applications across industries.