OpenAI has ushered in a brand new reasoning paradigm in large language models (LLMs) with its o1 model, which recently received a serious upgrade. Although OpenAI has a robust lead in reasoning models, it might be losing some ground to rapidly emerging open source competitors.
Models like o1, sometimes called Large Reasoning Models (LRMs), use extra cycles of computation at inference time to do more “pondering,” check their answers, and proper their answers. This allows them to resolve complex pondering problems that traditional LLMs struggle with and makes them particularly useful for tasks reminiscent of coding, mathematics and data evaluation.
However, developers have shown mixed reactions to o1 in recent days, especially after the updated release. Some have posted examples of how o1 has achieved incredible tasks, while others have done so expressed his frustration in regards to the model's confusing answers. Developers face a wide range of problems, from making illogical changes to code to ignoring instructions.
Secrecy around o1 details
Part of the confusion is on account of OpenAI's secrecy and refusal to point out the small print of how o1 works. The secret behind the success of LRMs are the extra tokens that the model generates when it reaches the ultimate answer, called the model's “thought” or “reasoning chain.” For example, in case you ask a classic LLM to generate code for a task, it would generate the code immediately. In contrast, an LRM generates reasoning tokens that examine the issue, plan the structure of the code, and generate multiple solutions before outputting the ultimate answer.
o1 hides the thought process and only shows the ultimate answer together with a message indicating how long the model thought and possibly a comprehensive overview of the thought process. This is partly to avoid cluttering the response and to offer a smoother user experience. More importantly, OpenAI considers the road of reasoning a trade secret and needs to make it difficult for competitors to breed o1's capabilities.
The cost of coaching latest models continues to rise and profit margins aren't keeping pace, forcing some AI labs to pursue even greater secrecy to extend their lead. Even the Apollo research that did that Red teaming the modelwas not given access to his line of reasoning.
This lack of transparency has led users to make all styles of speculations, including accusing OpenAI of degrading the model to cut back inference costs.
Open source models completely transparent
On the opposite hand, open source alternatives reminiscent of Alibaba's Qwen with Questions and Marco-o1 show the total chain of reasoning of their models. Another alternative is DeepSeek R1, which is just not open source but still exposes the reasoning tokens. By considering the reasoning chain, developers can troubleshoot their prompts and find ways to enhance the model's answers by adding additional instructions or contextual examples.
Insight into the reasoning process is especially vital if you need to integrate the model's answers into applications and tools that expect consistent results. Additionally, in enterprise applications it’s important to have control over the underlying model. Private models and the framework that supports them, reminiscent of: B. the protection measures and filters that test their inputs and outputs are consistently changing. While this may end up in higher overall performance, it will possibly break many Command Prompts and applications based on them. In contrast, open source models give the developer full control over the model, which is usually a more robust option for enterprise applications where performance on very specific tasks is more vital than general capabilities.
QwQ and R1 are still in preview and o1 is ahead by way of accuracy and ease of use. And for a lot of uses, reminiscent of general ad hoc prompts and one-off requests, o1 can still be a greater option than the open source alternatives.
But the open source community is quickly catching up with private models and we will expect more models to hit the market in the approaching months. They is usually a suitable alternative when visibility and control are crucial.