OpenAI is working on a new reasoning technology codenamed ‘Strawberry’
ChatGPT maker OpenAI is working on a new approach to its artificial intelligence models in a project codenamed “Strawberry,” according to a person familiar with the matter and internal documents reviewed by Reuters.
The project, which has not been previously reported in detail, comes as the Microsoft-backed startup races to prove that the types of models it offers are capable of delivering advanced inference capabilities.
Teams within OpenAI are working on Strawberry, according to a copy of a recent internal OpenAI document seen by Reuters in May. Reuters could not determine the exact date of the document, which details plans for how OpenAI plans to use Strawberry to conduct research. The source described the plan to Reuters as a work in progress. The news agency could not determine how close Strawberry was to being made public.
How Strawberry works remains a closely guarded secret even within OpenAI, the person said.
The document describes a project using Strawberry models with the aim of allowing the company’s AI to not only generate answers to queries, but also have the ability to plan ahead enough to navigate the internet autonomously and reliably to perform what OpenAI calls “deep learning,” according to the source.
This is something that AI models have so far failed to do, according to interviews with more than a dozen AI researchers.
When asked about Strawberry and the details reported in this story, an OpenAI spokesperson said in a statement: “We want our AI models to be able to see and understand the world as we do. Continuous research into new AI capabilities is a common practice in the industry, with the general belief that these systems will improve their reasoning abilities over time.”
The spokesperson did not directly answer questions about Strawberry.
Project Strawberry was previously known as Q*, and Reuters reported last year that the project was considered a breakthrough internally at the company.
Two sources described seeing what OpenAI employees told them earlier this year was a demo of Q*, which is capable of answering difficult scientific and mathematical questions that current commercial models cannot.
Another source briefed on the matter said OpenAI had tested its own AI and scored more than 90 percent on the MATH dataset, a benchmark for championship problems. Reuters could not determine whether this was the “Strawberry” project.
OpenAI demonstrated a research project at an internal all-hands meeting on Tuesday that it claims has new human-like reasoning skills, according to Bloomberg. An OpenAI spokesperson confirmed the meeting but declined to provide details about its content. Reuters was unable to determine whether the project being demonstrated was Strawberry.
OpenAI hopes this innovation will significantly improve the reasoning capabilities of AI models, the person said, adding that Strawberry involves a specialized method for processing an AI model after it has been pre-trained on very large datasets.
Reasoning is key to AI achieving human-level or superhuman intelligence, researchers interviewed by Reuters said.
While large language models can summarize dense texts and compose elegant prose much faster than any human, the technology often fails to solve common problems whose solutions seem intuitive to humans, like recognizing logical fallacies and playing checkers. When the model encounters these types of problems, it often “hallucinates” false information.
AI researchers interviewed by Reuters generally agreed that reasoning in the context of AI involves forming a model that allows AI to plan ahead, reflect how the physical world works, and reliably solve challenging multi-step problems.
Improving reasoning in AI models is seen as key to unlocking the ability of models to do everything from make major scientific discoveries to plan and build new software applications. OpenAI CEO Sam Altman said earlier this year that in AI, “the most important areas of progress will be around reasoning.”
Other companies like Google, Meta, and Microsoft are also experimenting with different techniques to improve reasoning in AI models, as are most academic labs doing AI research. However, researchers are divided on whether large language models (LLMs) are capable of incorporating long-term ideas and planning into how they make predictions. For example, one of the pioneers of modern AI, Yann LeCun, who works at Meta, has often said that LLMs are not capable of human-like reasoning.
CHALLENGE WHO
Strawberry is a key part of OpenAI’s plan to overcome those challenges, people familiar with the matter said. Documents reviewed by Reuters describe Strawberry’s goals, but not how it will do so.
In recent months, the company has been secretly signaling to developers and other outside parties that it is about to launch technology capable of significantly more advanced inference, according to four people who have heard the company’s presentations. They declined to be named because they were not authorized to speak about privacy issues.
Strawberry includes a specialized method called “post-training” of OpenAI’s generative AI models, or tweaking the base models to improve their performance in specific ways after they’ve been “trained” on a variety of generalized datasets, one of the sources said.
The training phase after model development includes methods such as “tuning”, a process used on most language models today with many methods, such as humans giving feedback to the model based on the model’s responses and providing the model with examples of correct and incorrect answers.
Strawberry bears similarities to a method developed at Stanford in 2022 called “Self-Taught Reasoner,” or “STaR,” one of the sources familiar with the matter said. STaR allows AI models to “bootstrap” themselves to higher levels of intelligence by generating their own training data with each iteration, and could theoretically be used to push language models beyond human-level intelligence, one of its creators, Stanford professor Noah Goodman, told Reuters.
“I think it’s both exciting and scary… if things continue in that direction, we have some serious things to think about as humans,” said Goodman. Goodman is not affiliated with OpenAI and is not familiar with Strawberry.
The document says one of the capabilities OpenAI is aiming Strawberry for is long-range tasks (LHT), which refer to complex tasks that require the model to plan ahead and execute a series of actions over a long period of time, the first source explained.
To do so, OpenAI is creating, training and evaluating models on what the company calls a “deep learning” dataset, according to internal OpenAI documents. Reuters was unable to determine what was in that dataset or how long a time period would mean.
OpenAI specifically wants its models to use these capabilities to conduct research by autonomously browsing the web with the help of a “CUA,” or computer-based agent, that can take action based on its findings, according to the document and one of the sources. OpenAI also plans to test its ability to perform the work of software engineers and machine learners. (Reporting by Anna Tong in San Francisco and Katie Paul in New York; Editing by Ken Li and Claudia Parsons)