HomeIndustriesThis “low-cost” open source AI model actually eats up your computing budget

This “low-cost” open source AI model actually eats up your computing budget

A comprehensive one recent study has found that open-source artificial intelligence models use significantly more computing resources to perform an identical tasks than their closed-source competitors, potentially eroding their cost benefits and changing the way in which firms evaluate AI deployment strategies.

The study was conducted by an AI company Nous researchfound that open-weight models use between 1.5 and 4 times more tokens – the fundamental units of AI computation – than closed models like those of OpenAI And Anthropocene. For easy knowledge questions, the gap widened dramatically, with some open models using as much as ten times more tokens.

“Open weight models use 1.5-4x more tokens than closed ones (as much as 10x for easy knowledge questions), sometimes making them dearer per query despite a lower cost per token,” the researchers wrote of their report published Wednesday.

The results challenge the AI ​​industry's prevailing assumption that open source models offer clear economic benefits over proprietary alternatives. While open source models typically cost less per token to run, the study suggests that this advantage “can easily be offset if more tokens are required to resolve a selected problem.”

The true cost of AI: Why “cheaper” models can blow your budget

The research investigates 19 different AI models in three task categories: questions on basic knowledge, mathematical problems and logic puzzles. The team measured “token efficiency” – what number of computing units models use relative to the complexity of their solutions – a metric that has rarely been studied systematically despite its significant cost implications.

“Token efficiency is a critical metric for several practical reasons,” the researchers noted. “Although hosting open-weight models could also be cheaper, this cost advantage could easily be offset if more tokens are required to resolve a selected problem.”

Open source AI models use as much as 12 times more computing resources than probably the most efficient closed models for fundamental knowledge questions. (Source: Nous Research)

The inefficiency is especially pronounced in Large Reasoning Models (LRMs), which require prolonged “Chains of thought” to resolve complex problems. These models, designed to think through problems step-by-step, can devour 1000’s of tokens to take into consideration easy questions that ought to require minimal calculations.

For basic knowledge questions like “What is the capital of Australia?” The study found that reasoning models output “lots of of tokens for serious about easy knowledge questions” that may very well be answered with a single word.

Which AI models actually offer good value for money?

The investigation revealed clear differences between the model providers. OpenAI's models, especially his o4 mini and newly released open source GPT-OSSS Variants showed exceptional token efficiency, especially on math problems. The study found that OpenAI models “characterize with extreme token efficiency on math problems” and devour as much as 3 times fewer tokens than other industrial models.

Open source options include Nvidia's Lama-3.3-nemotron-super-49b-v1 proved to be “probably the most token-efficient open-weight model across all domains,” while newer models from firms like Mistral were outliers and had “exceptionally high token usage.”

The efficiency gap varied significantly depending on the duty type. While open models used about twice as many tokens for math and logic problems, the difference increased significantly for easy knowledge questions where efficient considering must be unnecessary.

OpenAI's latest models achieve the bottom costs for easy questions, while some open source alternatives can cost significantly more despite lower prices per token. (Source: Nous Research)

What business leaders have to learn about the price of AI computing

The findings have immediate implications for AI adoption in enterprises, where computing costs can rise rapidly with usage. Companies evaluating AI models often deal with accuracy benchmarks and per-token pricing, but may overlook the general computational needs of real-world tasks.

“The higher token efficiency of closed weight models often compensates for the upper API prices of those models,” the researchers found when analyzing overall inference costs.

The study also found that providers of closed source models look like actively working on efficiency optimizations. “Closed weight models have been iteratively optimized to make use of fewer tokens to scale back inference costs,” while open-source models “have increased their token usage for newer versions, perhaps reflecting a priority on higher reasoning performance.”

Computational effort varies significantly between AI vendors, with some models using over 1,000 tokens for internal reasoning in easy tasks. (Source: Nous Research)

How researchers cracked the code for measuring AI efficiency

The research team faced particular challenges in measuring the efficiency of various model architectures. Many closed-source models don’t disclose their pure reasoning processes, but as a substitute provide condensed summaries of their internal calculations to stop competitors from copying their techniques.

To address this problem, the researchers used completion tokens – the overall units of computation billed for every query – as a proxy for reasoning effort. They discovered that “most newer closed-source models don’t propagate their raw reasoning traces” and as a substitute “use smaller language models to transcribe the chain of thought into summaries or condensed representations.”

The study's methodology involved testing with modified versions of known problems to attenuate the influence of stored solutions, equivalent to: B. changing variables in mathematical competition problems from the American Invitational Mathematics Examination (AIME).

Different AI models show different relationships between computation and output, with some providers compressing the reasoning traces while others provide complete details. (Source: Nous Research)

The Future of AI Efficiency: What's Next

The researchers suggest that token efficiency, alongside accuracy, should change into a primary optimization goal for future model development. “A denser CoT also enables more efficient use of context and might counteract context degradation in demanding reasoning tasks.” they wrote.

The release of OpenAIs open source gpt-oss modelsthat display state-of-the-art efficiency with “freely accessible CoT” could function a reference point for optimizing other open source models.

The complete research dataset and review code are available on GitHubThis allows other researchers to validate and expand on the outcomes. As the AI ​​industry moves toward ever more powerful considering capabilities, this study suggests that real competition will not be about who can construct the neatest AI, but fairly about who can construct probably the most efficient AI.

Because in a world where every token counts, probably the most wasteful models could also be forced out of the market, no matter how well they think.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read