OpenAI and its competitors seek new paths to smarter AI as current methods encounter limitations
Artificial intelligence companies like OpenAI are finding ways to overcome unexpected delays and challenges in their pursuit of ever-larger language models by developing training techniques that use the same methods. More humans let algorithms “think”.
Dozens of AI scientists, researchers and investors told Reuters they believe these techniques, behind OpenAI’s recently released o1 model, could reshape the AI arms race and has implications for the types of resources AI companies have insatiable demand for, from energy to chips.
OpenAI declined to comment for this story. After releasing the viral ChatGPT chatbot two years ago, tech companies that were heavily valued by the AI boom publicly asserted that “scaling up” existing models through Adding more data and computing power will consistently lead to improved AI models.
But now, some of the most prominent AI scientists are speaking out about the limitations of this “bigger is better” philosophy.
Ilya Sutskever, co-founder of the Secure Super Intelligence (SSI) AI lab and OpenAI, told Reuters recently that it is a result of scaling up pre-training – the phase in which AI models are trained using large amounts of unlabeled data to understand language patterns and structures – stabilized.
Sutskever is widely recognized as an early advocate of achieving major leaps in general AI advancement through the use of more data and computing power in pre-training, which ultimately created go to ChatGPT. Sutskever left OpenAI earlier this year to found SSI.
“The 2010s were an era of scale-up, now we are back in a time of wonder and discovery. Everyone is looking for the next thing,” Sutskever said. “Scaling appropriately is now more important than ever.”
Sutskever declined to share further details about how his team addressed the problem, other than to say SSI is working on an alternative method to scale up pre-training.
Behind the scenes, researchers at major AI labs have encountered delays and disappointing results in the race to release a large language model that outperforms OpenAI’s GPT-4 model, nearly two years old, according to three sources familiar with privacy matters.
So-called ‘training runs’ for large models can cost tens of millions of dollars running hundreds of chips simultaneously. They are more likely to encounter hardware errors due to how complex the system is; researchers may not know the final performance of the models until the end of the run, which can take months.
Another problem is that large language models gobble up huge amounts of data, and AI models have exhausted all the easily accessible data in the world. Power shortages also hinder training because the process requires large amounts of energy.
To overcome these challenges, researchers are exploring “trial-time computing,” a technique that enhances existing AI models during the phase known as “inference,” or when the model is being used. For example, instead of choosing a single answer right away, a model can generate and evaluate multiple possibilities in real time, ultimately choosing the best path forward.
This approach allows models to devote more processing power to challenging tasks such as math or coding problems, or complex operations that require human-like reasoning and decision-making abilities.
“It turns out that letting a bot think for just 20 seconds during a game of poker has the same performance boost as replicating the model,” said Noam Brown, a researcher at OpenAI who worked at OpenAI. image 100,000 times and train it 100,000 times longer. on o1, at the TED AI conference in San Francisco last month.
OpenAI has applied this technique in their newly released model called “o1”, formerly known as Q* and Strawberry, which was first reported by Reuters in July. The O1 model can “think”. through problems in a multi-step manner, similar to human reasoning. It also involves the use of data and feedback curated from PhDs and industry experts. The o1 line’s secret sauce is another training set built on ‘base’ models like the GPT-4, and the company says it plans to adopt. this technique with increasingly larger base models.
At the same time, researchers at other leading AI labs, from Anthropic, xAI and Google DeepMind, are also working to develop their own versions of the technique, according to five people familiar with the efforts. .
“We see a lot of low-hanging fruit that we can pick to make these models better very quickly,” Kevin Weil, OpenAI’s chief product officer, said at a tech conference in October. . “By the time everyone catches up, we will have tried to be three steps ahead.”
Google and xAI did not respond to requests for comment, and Anthropic was not immediately available for comment.
These impacts could change the competitive landscape for AI hardware, which has been dominated by insatiable demand for Nvidia’s AI chips. Prominent venture capitalists, from Sequoia to Andreessen Horowitz, who have poured billions into funding the development of expensive AI models at multiple AI labs, including OpenAI and xAI, are paying attention. pay attention to the transition and weigh the impact on their expensive bets.
“This shift will take us from a world of massive pre-trained clusters to distributed, server-based inference clouds,” Sonya Huang, a partner at Sequoia Capital, told Reuters. on the cloud for inference”.
Demand for Nvidia’s most advanced AI chips has fueled its rise to become the world’s most valuable company, surpassing Apple in October. Unlike training chips, where Nvidia dominates, the chip giant may face more competition in the inference market.
When asked about the possible impact on demand for its products, Nvidia pointed to the company’s recent presentations on the importance of the engineering behind the o1 model. CEO Jensen Huang talked about the growing demand for its chips for inference.
“We have now discovered the second scaling law, and this is the scaling law at inference time…All of these factors,” Huang said last month at a conference in India. has led to extremely high demand for Blackwell.” , referring to the company’s latest AI chip.