OpenAI threatens to ban users who probe its ‘Strawberry’ AI model

News7f September 18, 2024

0 109 2 minutes read

OpenAI threatens to ban users who probe its 'Strawberry' AI model

OpenAI really doesn’t want you to know what their latest AI model is “thinking.” Because the company has been released its The “Strawberry” AI Model Family Last week, while advertising the so-called inference capabilities of the o1-preview and o1-mini, OpenAI sent out warning emails and threatened to ban any users who tried to probe how the model worked.

Unlike OpenAI’s previous AI models, such as GPT-4oThe company specifically trained o1 to go through a step-by-step problem-solving process before providing an answer. When a user asks the model “o1” a question in ChatGPTUsers have the option to view this train of thought written out in the ChatGPT interface. However, by design, OpenAI hides the raw train of thought from the user, instead presenting a filtered interpretation generated by a second AI model.

There is nothing more fascinating to enthusiasts than hidden information, so the race was on between hackers and red teamers to try to uncover o1’s raw train of thought using jailbreak or inject now techniques that attempt to trick a model into revealing its secrets. There have been initial reports of some success, but nothing has been definitively confirmed yet.

Along the way, OpenAI is monitoring through the ChatGPT interface, and the company is reportedly cracking down on any attempts to probe o1’s reasoning, even from those who are merely curious.

A user X reported (confirmed by other peopleincluding Scale AI reminder engineer Riley Next Door) that they received a warning email if they used the term “reasoning trail” in conversation with o1. Others speak The alert is triggered simply by asking ChatGPT about the model’s “reasoning”.

The warning email from OpenAI stated that specific user requests had been flagged for violating policies against circumvention of protections or safeguards. “Please cease this activity and ensure you are using ChatGPT in accordance with our Terms of Service and Usage Policy,” the email read. “Further violations of this policy may result in loss of access to GPT-4o with Reasoning,” referring to the internal name of the o1 model.

Marco Figueroa, the man manage Mozilla’s GenAI bug bounty program was among the first to post about OpenAI’s warning email on X last Friday, complain that it hindered his ability to do active red-teaming safety research on the model. “I was so focused on #AIRedTeaming that I didn’t realize I got this email from @OpenAI yesterday after all my cracking,” he wrote. “I’m now blacklisted!!!”

Hidden Thoughts

In the post titled “Learn to Reason with LLM”On OpenAI’s blog, the company said that the thought sequences hidden in AI models provide a unique monitoring opportunity, allowing them to “read the mind” of the model and understand its so-called thought processes. Those processes are most useful to the company if they are left uncensored, but that may not be in the company’s best commercial interests for a number of reasons.

“For example, in the future we may want to monitor thought chains for signs of user manipulation,” the company writes. “However, for this to be effective, the model must be free to express its thoughts in an unaltered form, so we cannot train any compliance policies or user preferences into thought chains. We also do not want to create thought chains that are not directly linked to users.”