ChatGPT Manufacturer OpenAI has announced its next major product release: a generative AI model codenamed Strawberry, officially called OpenAI o1.
More specifically, o1 is definitely a family of models, two of which can be found on Thursday in ChatGPT and thru OpenAI's API: o1-preview and o1-mini, a smaller, more efficient model for code generation.
You should be subscribed to ChatGPT Plus or Team to see o1 within the ChatGPT client. Corporate and education users will get access early next week.
Note that o1's chatbot experience is currently pretty poor. Unlike GPT-4o, o1's predecessor, o1 can't yet browse the online or analyze files. The model does have image evaluation capabilities, but these have been disabled pending further testing. And o1 is speed-limited; weekly limits are currently 30 messages for o1-preview and 50 for o1-mini.
Another drawback is that o1 could be very expensive. In the API, o1-preview costs $15 per 1 million input tokens and $60 per 1 million output tokens. That's 3 times the price in comparison with GPT-4o for input and 4 times the price for output. (are bits of raw data; 1 million is about 750,000 words.)
OpenAI plans to offer access to o1-mini to all free ChatGPT users, but has not yet set a release date. We will remind the corporate.
Chain of argument
OpenAI o1 avoids a few of the pondering traps that typically trip up generative AI models because it might probably effectively fact-check itself by spending more time considering all parts of a matter. According to OpenAI, o1's ability to “think” before responding to queries is what makes it “qualitatively” different from other generative AI models.
When given time beyond regulation to “think,” o1 can think through a task holistically – planning ahead and executing a series of actions over an extended time period that help the model arrive at a solution. This makes o1 well-suited to tasks that require aggregating the outcomes of multiple subtasks, corresponding to identifying confidential emails in a lawyer's inbox or brainstorming a product marketing strategy.
In a series of Posts Noam Brown, a researcher at OpenAI, said on X on Thursday that “o1 is trained using reinforcement learning.” This teaches the system to “take into consideration a personal chain of thought before answering,” by receiving rewards when o1 gives the suitable answers and penalties when it doesn't, he said.
Brown added that OpenAI is using a brand new optimization algorithm and a training dataset that features “reasoning data” and scientific literature specifically tailored to reasoning tasks. “The longer (o1) thinks, the higher it’s,” he said.
TechCrunch didn't get a probability to check o1 ahead of its debut; we'll get our hands on it as soon as we will. But based on one person with access — Pablo Arredondo, VP at Thomson Reuters — o1 is healthier than OpenAI's previous models (e.g., GPT-4o) at things like analyzing legal opinions and identifying solutions to problems in LSAT logic games.
“We saw it tackle larger, more layered analytics,” Arredondo told TechCrunch. “Our automated tests also showed improvements in a wide range of easy tasks.”
In a qualifying exam for the International Mathematical Olympiad (IMO), a highschool math competition, o1 solved 83% of the issues accurately, while GPT-4o solved only 13%, based on OpenAI. (This is less impressive considering that Google has used DeepMind's recent AI reached a silver medal in an equivalent of the particular IMO competition.) OpenAI also says that o1 scored within the 89th percentile of participants in the net programming competition rounds often known as Codeforces—whatever which means, higher than DeepMind's flagship system, AlphaCode 2.
In general, o1 should perform higher on data evaluation, science, and coding problems, says OpenAI. (GitHub, which tested o1 with its AI coding assistant GitHub Copilot, Reports that the model is sweet at optimizing algorithms and app code.) And not less than based on OpenAI's benchmarking, o1 outperforms GPT-4o in its multilingual capabilities, especially for languages ​​like Arabic and Korean.
Ethan Mollick, Professor of Management at Wharton University, wrote his impressions of o1 after a month of use in a post on his personal blog. On a difficult crossword puzzle, o1 performed well, he said – he got all of the answers right (although he did hallucinate a brand new clue).
OpenAI o1 just isn’t perfect
Well, there are disadvantages.
OpenAI o1 may be slower than other models depending on the query. Arredondo says o1 takes over 10 seconds to reply some questions. It indicates its progress by displaying a label for the subtask currently being performed.
Given the unpredictable nature of generative AI models, o1 is more likely to produce other flaws and limitations. Brown admitted that o1 stumbles on occasion in games of tic-tac-toe, for instance. And in a Technical articleOpenAI said it had received anecdotal feedback from testers that o1 was more more likely to hallucinate (i.e. confidently make things up) than GPT-4o—and was less more likely to admit when it didn't know the reply to a matter.
“O1 still has errors and hallucinations,” Mollick writes in his post. “It's still not bug-free.”
We will little question learn more concerning the various issues over time when we have now the chance to place o1 through its paces ourselves.
Tough competition
We can be remiss if we didn't indicate that OpenAI is way from the one AI vendor exploring a lot of these reasoning methods to enhance the factuality of models.
Google DeepMind researchers recently published a study This shows that the performance of models may be significantly improved without additional optimizations, essentially by giving models more computation time and guidance to satisfy incoming queries.
OpenAI illustrates the toughness of competition said that, partially because of “competitive benefits,” they decided to not display the raw data of o1's “thought chains” in ChatGPT. (Instead, the corporate selected to display “model-generated summaries” of the chains.)
OpenAI would be the first to launch o1, but assuming competitors follow suit with similar models soon, the corporate's real challenge will likely be to make o1 widely available – and at a lower cost.
From there, we'll see how quickly OpenAI can deliver improved versions of o1. The company says it desires to experiment with o1 models reasoning for hours, days, and even weeks to further improve their reasoning capabilities.