Research shows how large language models like GPT-3 can learn a new task from just a few examples
Large language models like OpenAI’s GPT-3 are massive neural networks that can generate human-like text, from poetry to programming code. Trained using internet data warehouses, these machine learning models take a small amount of text input and then predict the text that is likely to appear next.
But that’s not all this model can do. Researchers are exploring a strange phenomenon called contextual learning, in which a large language model learns how to complete a mission after seeing only a few examples—despite the fact that it wasn’t trained for that task. For example, someone could give the model some example sentences and their feelings (positive or negative), then prompt it with a new sentence and the model can give the correct emotion.
Typically, a machine learning model like GPT-3 will need to be retrained with new data for this new task. During this training, the model updates its parameters as it processes new information to learn the task. But with contextual learning, the model’s parameters are not updated, so it seems that the model learns a new task without learning anything.
Scientists from MIT, Google Research and Stanford University are trying to unravel this mystery. They studied models very similar to large language models to see how they could learn without updating the parameters.
The researchers’ theoretical results suggest that these massive neural network models are likely to contain smaller, simpler linear models buried within them. The large model can then perform a simple learning algorithm to train this smaller linear model to complete a new task, using only the information already present in the larger model. Its parameters remain fixed.
An important step in understanding the mechanisms behind learning in context, this study opens the door to more exploration around learning algorithms These massive models are doable, says Ekin Akyürek, a researcher. computer science PhD student and lead author of a paper exploring this phenomenon. With a better understanding of learning in context, researchers can enable models to complete new tasks without costly retraining.
“Usually, if you want to refine these models, you need to collect domain-specific data and do some complicated engineering. But now we just need to give it one input, five. example and it will accomplish what we want. So in-context learning is a pretty interesting phenomenon,” Akyürek said.
The article was published on arXiv print server available.
Joining Akyürek on the paper is Dale Schuurmans, a research scientist at Google Brain and a professor of computer science at the University of Alberta; as well as senior authors Jacob Andreas, Associate Professor of X Society in the MIT Department of Electrical Engineering and Computer Science and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). ); Tengyu Ma, assistant professor of computer science and statistics at Stanford; and Danny Zhou, principal scientist and director of research at Google Brain. The research will be presented at the International Conference on Performing Learning.
A model within a model
In the machine learning research community, many scientists believed that large language models could perform contextual learning because of the way they were trained, Akyürek said.
For example, GPT-3 has hundreds of billions of parameters and is trained by reading huge chunks of text on the internet, from Wikipedia articles to Reddit posts. So when someone shows sample examples of a new task, they’re likely seeing something very similar because their training dataset includes text from billions of web pages. It repeats patterns it has seen during training, instead of learning how to perform new tasks.
Akyürek hypothesizes that context learners are not just matching previously seen patterns, but instead, they are actually learning how to perform new tasks. He and others experimented with prompting these models using aggregated data they had never seen anywhere before, and found that the models could still learn. Ask from just a few examples. Akyürek and his colleagues think that perhaps these neural network models are smaller in size machine learning model within them that models can train to complete a new task.
“That could explain nearly all of the learning phenomena we’ve seen with these large models,” he said.
To test this hypothesis, the researchers used a neural network model called a transformer, which has the same architecture as GPT-3, but has been specially trained for contextual learning.
By exploring the architecture of this transformer, they have theoretically demonstrated that it is possible to write a linear model in its hidden states. A neural network consists of many layers of nodes connected together to process data. Hidden states are the layers between input and output layers.
Their mathematical evaluations show that this linear model is written somewhere in the first layers of the transformer. The transformer can then update the linear model by implementing simple learning algorithms.
In essence, the model simulates and trains a smaller version of itself.
Exploring hidden layers
The researchers explored this hypothesis using exploratory experiments, in which they looked at the hidden layers of a transformer to try and recover a certain amount.
“In this case, we tried to recover the actual solution to the linear model, and we were able to show that the parameter is written in the hidden state. This means where the model is linear. It’s in there,” he said.
Based on this theoretical work, the researchers were able to enable a transformer to perform contextual learning by adding only two layers to the neural network. Akyürek cautions that there are still many technical details to work out before it becomes feasible, but it could help engineers create models that can complete new tasks without having to retrain with the data. new.
“The paper sheds light on one of the most remarkable properties of modern large language models—the ability to learn from the data given in their input without explicit training. In a simplified case of linear regression, the authors show theoretically how models can perform the standard Mike Lewis, a research scientist at Facebook AI Research who is not involved in the work. These results are a stepping stone to understanding how models can learn more complex tasks and will help researchers design better training methods for models. languages to further improve their performance.”
In the future, Akyürek plans to continue to explore contextual learning with more complex functions than the linear models they have studied in this work. They can also apply these experiments to large language models to see if their behaviors can be described by simple learning algorithms. Also, he wants to dig deeper into the types of pre-training data that can enable contextual learning.
“With this work, people can now imagine how these models can learn from examples,” Akyürek said. So I hope that it will change some people’s view of what it is. learn in context”. “These models aren’t as stupid as people think. They don’t just memorize these tasks. They can learn new tasks, and we’ve shown how that can be done.”
Ekin Akyürek et al., Which learning algorithm is contextual learning? Investigate with linear models, arXiv (2022). DOI: 10.48550/arxiv.2211.15661
Massachusetts Institute of Technology
This story is republished with permission from MIT News (web.mit.edu/newsoffice/), a popular website that covers MIT research, innovation, and teaching.
quote: Research showing how large language models like GPT-3 can learn a new task from just a few examples (2023, February 7) retrieved February 7, 2023 from https:/ /techxplore.com/news/2023-02-large-language- gpt-task-examples.html
This document is the subject for the collection of authors. Other than any fair dealing for private learning or research purposes, no part may be reproduced without written permission. The content provided is for informational purposes only.