The Samsung AI researcher's latest, open reasoning model TRM outperforms models 10,000 times larger - for certain problems

October 9, 2025

245

The trend that AI researchers are developing latest, small Open source generative models that outperform far larger, proprietary competitors continued this week with one other stunning advance.

Alexia Jolicoeur-MartineauSenior AI Researcher at Samsung’s Advanced Institute of Technology (SAIT) in Montreal, Canada, has introduced the Tiny Recursion Model (TRM). – a neural network so small that it incorporates only 7 million parameters (internal model settings), yet competes with or outperforms state-of-the-art language models which might be 10,000 times larger in parameter count o3-mini from OpenAI and Gemini 2.5 Pro from Google, about among the hardest standards of argument in AI research.

The goal is to point out that very powerful latest AI models could be created cost-effectively without requiring massive investments within the graphics processing units (GPUs) and power required to coach the larger, multi-trillion-parameter flagship models that power many LLM chatbots today. The results were described in a research report published on the open access website arxiv.org entitled “Less is more: Recursive pondering with tiny networks.”

“The concept that you might have to depend on massive base models trained by a big company for tens of millions of dollars to resolve difficult tasks is a trap,” Jolicoeur-Martineau wrote on the web site social network X. “Currently there is just too much concentrate on using LLMs slightly than developing and expanding in latest directions.”

Jolicoeur-Martineau also added, “With recursive pondering, it seems that 'less is more'.” A tiny model that’s pre-trained from scratch, recursive on itself, and updates its answers over time can accomplish loads without breaking the bank.”

The TRM code is now available Girub under a business-friendly, commercial-use MIT License – meaning anyone from researchers to firms can adopt it, modify it, and use it for their very own purposes, even for industrial applications.

An enormous limitation

However, readers must be aware that TRM is specifically designed to perform well on structured, visual, grid-based problems corresponding to Sudoku, mazes, and puzzles ARC (Abstract and Reasoning Corpus) AGI benchmarkThe latter offers tasks that must be easy for humans but difficult for AI models, corresponding to: B. sorting colours in a grid based on a previous but not an identical solution.

From hierarchy to simplicity

The TRM architecture represents a radical simplification.

It is predicated on a method called Hierarchical Reasoning Model (HRM) The program, launched earlier this yr, showed that small networks can solve logic puzzles corresponding to Sudoku and mazes.

HRM relied on two cooperating networks – one operating at high frequency, the opposite at low frequency – supported by biologically inspired arguments and fixed-point mathematical reasoning. Jolicoeur-Martineau found this unnecessarily complicated.

TRM removes these elements. Instead of two networks, it uses one single two-layer model that recursively refines its own predictions.

The model begins with an embedded query and an initial answer, represented by variables X, jAnd e.g. Through a series of reasoning steps, it updates its internal latent representation e.g and refines the reply j until a stable output occurs. Each iteration corrects potential errors from the previous step and ends in a self-improving reasoning process without additional hierarchy or mathematical effort.

How recursion replaces scaling

This is the core idea behind TRM Recursion can replace depth and size.

By iteratively excited about its own output, the network effectively simulates a much deeper architecture without the associated storage or computational overhead. This recursive cycle, extending over as much as sixteen monitoring steps, allows the model to make progressively higher predictions – just like how large language models use multi-stage “thought chain” inference, but here achieved with a compact feed-forward design.

The simplicity pays off in each efficiency and generalization. The model uses fewer layers, no fixed-point approximations, and no dual-network hierarchy. A light-weight Holding mechanism decides when to stop refining to avoid unnecessary calculations while maintaining accuracy.

Performance that punches above its weight

Despite its small footprint, TRM delivers benchmark results that rival and even exceed models tens of millions of times larger. During testing, the model achieved:

87.4% accuracy To Sudoku Extreme (from 55% for HRM)
85% accuracy To Labyrinth hard Puzzle
45% accuracy To ARC-AGI-1
8% accuracy To ARC-AGI-2

These results exceed or nearly match the performance of several large, high-end language models, including DeepSeek R1, Gemini 2.5 ProAnd o3 minialthough TRM uses lower than 0.01% of its parameters.

Such results suggest that recursive pondering, slightly than scaling, could possibly be the important thing to tackling abstract and combinatorial reasoning problems – areas where even best-in-class generative models often stumble.

Design philosophy: Less is more

TRM's success is predicated on conscious minimalism. Jolicoeur-Martineau found that reducing complexity led to raised generalization.

When the researcher increased the variety of layers or the model size, performance decreased because of overfitting on small data sets.

In contrast, the two-layer structure, combined with recursive depth and deep supervisionoptimal results achieved.

The model also performed higher when self-attention was replaced by a simpler multilayer perceptron for tasks with small, fixed contexts like Sudoku.

For larger grids like ARC puzzles, self-attention remained worthwhile. These results highlight that the model architecture should match the information structure and scale and never default to maximum capability.

Train small, think big

TRM is now officially available as Open source under an MIT license To Girub.

The repository includes complete training and assessment scripts, dataset builders for Sudoku, Maze and ARC-AGI, and reference configurations to breed published results.

It also documents computational requirements starting from a single NVIDIA L40S GPU for Sudoku training to multi-GPU H100 setups for ARC-AGI experiments.

The open version confirms that TRM was developed specifically for structured, grid-based brain teasers as a substitute of general language modeling.

Each benchmark – Sudoku-Extreme, Maze-Hard and ARC-AGI – uses small, well-defined input-output grids which might be tuned to the model's recursive monitoring process.

The training involves significant data augmentation (e.g. color permutations and geometric transformations), highlighting that the efficiency of TRM lies within the parameter size slightly than the general computational requirement.

The model's simplicity and transparency make it more accessible to researchers outside of huge corporate labs. Its codebase builds directly on the previous Hierarchical Reasoning Model framework, but removes the biological analogies, multiple network hierarchies, and fixed-point dependencies of HRM.

In this manner, TRM provides a reproducible basis for exploring recursive reasoning in small models – a counterpoint to the prevailing “scale is all you wish” philosophy.

Community response

The release of TRM and its open source codebase sparked a direct debate amongst AI researchers and practitioners

Proponents hailed TRM as proof that small models can outperform the giants, calling it “10,000 times smaller and yet smarter” and a possible step towards architectures that think slightly than simply scale.

Critics responded that TRM's domain was narrow and concentrated limited, grid-based puzzles – and that the computational savings are primarily because of size slightly than overall runtime.

Researcher Yunmin Cha noted that training TRM relies on strong expansion and recursive passes, “more computing power, same model.”

Cancer geneticist and data scientist Chey Loveday emphasized that TRM is a solverno chat model or text generator: It is characterised by structured pondering, but not by open language.

Machine learning researcher Sebastian Raschka positioned TRM as a very important simplification of HRM slightly than a brand new type of general intelligence.

He described the method as “a two-step loop that updates an internal pondering state after which refines the reply.”

Several researchers including Augustin Nabeleagreed that the strength of the model lies in its clear argumentation structure, but noted that future work would want to point out transfer to less restricted problem types.

The consensus online is that while TRM is narrow, the message is broad: careful recursion, not constant expansion, could drive the subsequent wave of argumentation research.

Looking ahead

While TRM currently applies to supervised reasoning tasks, its recursive framework opens several future directions. Jolicoeur-Martineau suggested exploration generative or multi-response variantswhere the model could generate multiple possible solutions as a substitute of a single deterministic solution.

Another open query concerns the scaling laws for recursion – determining how far the “less is more” principle can expand as model complexity or data size increases.

Ultimately, the study offers each a practical tool and a conceptual reminder: advances in AI do not need to depend upon ever larger models. Sometimes teaching a small network to think twice – and recursively – could be simpler than making a big network think once.

The Samsung AI researcher's latest, open reasoning model TRM outperforms models 10,000 times larger – for certain problems

An enormous limitation

From hierarchy to simplicity

How recursion replaces scaling

Performance that punches above its weight

Design philosophy: Less is more

Train small, think big

Community response

Looking ahead

LEAVE A REPLY Cancel reply

Must Read

Sexualized deepfakes on X are an indication of things to come back. New Zealand law is already lagging far behind

The next generation of driverless cars might want to take into consideration what's on the road, not only what they see

Moxie Marlinspike offers a privacy-conscious alternative to ChatGPT

Microsoft's AI deal guarantees digital sovereignty for Canada, but is that a promise the country can keep?

Microsoft's AI deal guarantees digital sovereignty for Canada, but is that a promise they’ll keep?

AI could make dead people talk – why doesn’t that comfort us?

The world's first social media wargame shows how AI bots can influence elections

Latest articles

Sexualized deepfakes on X are an indication of things to come back. New Zealand law is already lagging far behind

The next generation of driverless cars might want to take into consideration what's on the road, not only what they see

Moxie Marlinspike offers a privacy-conscious alternative to ChatGPT

Our Newsletter

The Samsung AI researcher's latest, open reasoning model TRM outperforms models 10,000 times larger – for certain problems

An enormous limitation

From hierarchy to simplicity

How recursion replaces scaling

Performance that punches above its weight

Design philosophy: Less is more

Train small, think big

Community response

Looking ahead

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter