A comprehensive New study It has shown that models for artificial intelligence of open source models use far more arithmetic resources than their costs for closed source when executing an identical tasks, which can undermine their cost benefits and redesign the evaluation of AI reporting strategies.
The research carried out by the AI company Nous researchfound that open weight models between 1.5 and 4 times more token-the basic units of the AI calculation used as closed models similar to those of Openai And Anthropic. With easy questions of information, the gap expanded dramatically, with some open models used as much as 10 times more tokens.
Measurement of considering efficiency in argumentation models: the missing benchmarkhttps://t.co/b1e1rjx6vz
We have measured the token use across argumentation models: Open models give 1.5-4x more tokens than closed models for an identical tasks, but with enormous deviations depending on the tasks (to … to … pic.twitter.com/ly1083won8
– Nous Research (@noussresearch) August 14, 2025
“Open weight models use 1.5–4 Ă— more tokens than closed (as much as 10 Ă— for easy knowledge questions), which implies that they generally change into dearer despite lower costs per query,” wrote the researchers of their report of their report.
The results query a predominant assumption within the AI industry that open source models offer clear economic benefits over proprietary alternatives. While open source models generally cost less per token, the study suggests that this advantage “can easily compensate for in the event you need more tokens for a selected problem.
The actual costs for AI: Why “cheaper” models can break your budget
The investigated research 19 different AI models In three categories of tasks: basic knowledge issues, mathematical problems and logical puzzles. The team measured the “token efficiency” – what number of computer units use models in relation to the complexity of their solutions – a metric that, despite its considerable effects, has received little systematic examination.
“The token efficiency is a critical metric for several practical reasons,” the researchers noticed. “While the hosting of open weight models could also be cheaper, this cost advantage may be easily compensated for in the event you need more tokens for a selected problem.”
Inefficiency is especially pronounced for big argumentation models (LRMS) which might be expanded “Chains”To solve complex problems. These models that think through problems step-by-step can devour 1000’s of tokens that take into consideration easy questions that ought to require a minimal calculation.
For basic knowledge like “What is the capital of Australia?” The study showed that argumentation models “tons of of tokens take into consideration easy questions of information”, which could possibly be answered in a single word.
Which AI models actually deliver bang on your money
Research resulted in strong differences between model providers. Openais models, especially his his O4 mini and newly published open source Gpt-Osses Variants showed a unprecedented token efficiency, especially for mathematical problems. The study showed that Openai models “occur for extreme token efficiency in mathematics problems”, with as much as thrice less token than other business models used.
Under open source options, Nvidia's Lama-3.3-Nemotron Super-49b-V1 developed as “the token -standing open weight model in all areas”, while newer models of corporations like Magistral showed “exceptionally high token use” as a outlier.
The efficiency gap varied considerably from the sort of task. While open models used about twice as many tokens for mathematical and logical problems, the difference for easy questions of information should occur during which efficient argument ought to be unnecessary.

What company managers have to know in regards to the costs for the AI computers
The results have a direct effect on the introduction of corporations AI, during which the computing costs can quickly scale with use. Companies that evaluate AI models often concentrate on accuracy benchmarks and pro-member pricing, but can overlook the whole arithmetic requirements for real tasks.
“The higher token efficiency of models with closed weight often compensates for the upper API price design of those models,” said the researchers when analyzing the excellent key parts.
The study also showed that providers of closed source model providers appear to be actively optimized for efficiency. “Models with closed weight have been optimized iteratively to make use of fewer tokens to scale back the inference costs”, while open source models “increased their token use for newer versions and possibly reflect a priority for higher argumentation performance”.

How researchers have cracked the code for AI efficiency measurement
The research team stood with unique challenges in measuring efficiency between different model architectures. Many models with closed source don’t reveal their raw argumentation processes, but provide compressed summaries of their internal calculations to stop competitors from copying their techniques.
In order to treatment this, the researchers used completion -offs – the complete computing units that were invoiced for every query – used as a deputy for the argument. They found that “the newest models didn’t share their raw traces of argument” and as a substitute use smaller voice models to transmit the chain of thought into summaries or compressed representations.
The study of the study included testing with modified versions of well -known problems to reduce the influence of mermanced solutions, e.g. B. variables in mathematical competitive problems from the American Invitational Mathematics Examination (Aime).

The way forward for AI efficiency: what's next
The researchers suggest that token efficiency should change into a primary optimization goal along with accuracy for future model development. “A densifter cot also enables more efficient consumption of context and might counteract contexts with difficult argumentation tasks.” You wrote.
The publication of Openais Open Source GPT-OĂź modelsDemonstrating the newest efficiency with “freely accessible cot” could function a reference point for the optimization of other open source models.
The complete research data record and the evaluation code are Available on GithubSo that other researchers can validate and expand the outcomes. Since the AI industry runs to stronger argumentation skills, this study suggests that the true competition will not be about who can construct the neatest AI – but who can construct up essentially the most efficient.
In a world during which every token counts, essentially the most lavish models could also be out of the market, no matter how well you may think.

