Samsung AI researcher’s new open TRM model outperforms larger models by 10,000 times – on specific problems

The trend of researchers in the field of artificial intelligence to develop new technologies, small Open source generative models continued to outperform their larger peers this week with another stunning advance.

Alexia Jolicoeur Martineausenior researcher in artificial intelligence Samsung Advanced Institute of Technology (SAIT) In Montreal, Canada, He has Introduce the micro-recursive model (TRM) – A neural network so small that it has only 7 million parameters (internal model settings), yet it rivals or outperforms state-of-the-art language models 10,000 times larger in terms of its number of parameters, including OpenAI’s o3-mini and Google’s Gemini 2.5 Pro, On some of the most difficult inference standards in AI research.

The goal is to show that new, high-performance AI models can be built affordably without requiring huge investments in graphics processing units (GPUs) and the power needed to train the larger, multi-trillion-parameter master models that power many LLM chatbots today. The findings are described in a paper published on the open access website arxiv.org titled "Less is more: Iterative thinking with small networks."

"The idea that one must rely on huge foundational models trained for millions of dollars by some major corporation in order to solve difficult tasks is a trap." Jolicoeur Martineau wrote on Social Network X. "Right now, there is too much focus on exploiting MBAs rather than inventing and expanding new trend lines."

Jolicoeur Martineau also added: "With recursive logic, it turns out that “less is more”. A small model that is pre-trained from scratch, iterates on itself and updates its answers over time, can achieve a lot without spending a lot."

TRM token is now available on github Under an MIT license, it’s enterprise-friendly and commercially viable — meaning anyone, from researchers to companies, can acquire, modify, and publish it for their own purposes, even for commercial applications.

One big caveat

However, readers should be aware that TRM was specifically designed to perform well in structured, visual, and grid-based problems such as Sudoku, mazes, and puzzles on the computer. ARC (Abstract and Reasoning Group) – AGI standardthe latter introduces tasks that should be easy for humans but difficult for AI models, such as sorting colors on a grid based on a previous, but not identical, solution.

From hierarchy to simplicity

The TRM architecture represents a radical simplification.

It is based on a technology called Hierarchical Reasoning Model (HRM) Introduced earlier this year, it showed that small networks could solve logic puzzles such as Sudoku and mazes.

HRM relied on two cooperating networks—one operating at high frequency, the other at low frequency—supported by biologically inspired arguments and mathematical justifications involving fixed point theorems. Jolicoeur Martineau found this unnecessarily complicated.

TRM strips these elements away. Instead of two networks, it uses One two-layer model Which iteratively improves its predictions.

The model begins with an embedded question and an initial answer, represented by variables S, yand Z. Through a series of inference steps, it updates its latent internal representation Z And the answer improves y Until it converges to a stable output. Each iteration corrects potential errors from the previous step, resulting in a self-improving reasoning process without additional hierarchy or mathematical burden.

How recursion replaces scaling

The basic idea behind TRM is this Recursion can replace depth and size.

By iteratively reasoning over its outputs, the network effectively simulates a much deeper structure without the need for associated memory or computational cost. This recursive cycle, spanning up to sixteen supervisory steps, allows the model to make progressively better predictions – similar in spirit to how large language models use multi-step “chain of thought” inference, but here achieved through a compact, feed-forward design.

Simplicity pays off in both efficiency and generalizability. The model uses fewer layers, no fixed point approximations, and no dual network hierarchy. Light weight Stop mechanism Decides when to stop refining, preventing wasted calculations while maintaining accuracy.

Performance punches above its weight

Despite its small size, TRM delivers benchmark results that rival or exceed millions of times larger models. In testing, the model achieved the following:

Accuracy 87.4% on Extreme Sudoku (up from 55% for human resources management)
85% accuracy on Difficult maze Puzzles
45% accuracy on arc-age-1
8% accuracy on arc-age-2

These results exceed or closely match performance from many state-of-the-art large language models, including Deep Sec R1, Gemini 2.5 Proand o3-mini,despite TRM using less than 0.01% of its parameters.

Such results suggest that recursive inference, not scalarity, may be the key to dealing with problems of abstract and combinatorial inference, areas in which even high-level generative models often falter.

Design philosophy: less is more

TRM’s success stems from intentional simplicity. Jolicoeur Martineau found that reducing complexity leads to better generalization.

When the researcher increased the number of layers or model size, performance decreased due to overfitting on small data sets.

In contrast, the two-layer structure, combined with recursive depth and Deep supervisionAchieve perfect results.

The model also performed better when self-attention was replaced by Simplest multilayer perception In tasks with small, fixed contexts such as Sudoku.

For larger networks, such as ARC puzzles, self-attention remained valuable. These results confirm that the model structure should match the structure and size of the data rather than default to maximum capacity.

Small practice, big thinking

TRM is now officially available as Open source under the MIT License on github.

The repository includes full training and evaluation scripts, dataset generation tools for Sudoku, Maze, and ARC-AGI, and reference configurations for reproducing published results.

It also documents computing requirements ranging from a single NVIDIA L40S GPU for Sudoku training to multi-GPU H100 setups for ARC-AGI experiments.

The open version confirms that TRM is specifically designed for Structured, grid-based thinking tasks Rather than general-purpose linguistic modeling.

Each benchmark—Sudoku-Extreme, Maze-Hard, and ARC-AGI—uses small, well-defined input and output grids, consistent with the model’s iterative supervision process.

Training involves significant data augmentation (such as color permutations and geometric transformations), which confirms that the efficiency of TRM lies in the parameter size rather than the total computing demand.

The model’s simplicity and transparency make it accessible to researchers outside large corporate laboratories. Its code base builds directly on the previous hierarchical inference model framework but removes HRM biometrics, multiple network hierarchies, and fixed point dependencies.

In doing so, TRM provides a reproducible baseline for exploring iterative inference in small models—a counterpoint to the prevailing “scale is all you need” philosophy.

Community reaction

The launch of TRM and its open source database sparked immediate discussion among researchers and practitioners in the field of AI X. While many praised the achievement, others questioned how widely its methods could be generalized.

Supporters hailed TRM as proof that small models can outperform giants, calling it “10,000 times smaller but smarter“And a potential step toward architectures that think rather than just measure.”

Critics have responded that TRM’s scope is narrow and focused Set and grid-based puzzles -And that their computing savings come primarily from scale, not from total run time.

researcher Yunmin Cha He noted that TRM training relies on heavy reinforcement and repeated passes, “more computing, same model.”

Cancer geneticist and data scientist Chi loveday He stressed that TRM is Halalnot a chat template or text generator: it excels at structured thinking but not at open language.

Researcher in the field of machine learning Sebastian Raschka Positioning TRM as an important simplification of human resource management rather than a new form of general intelligence.

He described its process as “a two-step loop that updates the state of the internal heuristics, and then refines the answer.”

Many researchers, incl Augustine NabelHe agreed that the strength of the model lies in its clear logical structure, but noted that future work will need to show a move to less constrained problem types.

The emerging Internet consensus is that technical research management may be narrow, but its message is broad: it is careful replication, not constant expansion, that can drive the next wave of inferential research.

Looking forward

While TRM is currently applicable to supervised inference tasks, its iterative framework opens several future directions. Jolicoeur Martineau suggested exploration Generative or multiple-response variableswhere the model can produce multiple possible solutions instead of a single deterministic solution.

Another open question involves measuring redundancy laws, and determining how far the “less is more” principle can extend as model complexity or data size grows.

Ultimately, the study provides a practical tool and a conceptual reminder: progress in AI does not need to rely on ever-larger models. Sometimes, teaching a small network how to think carefully — and repeatedly — can be more powerful than making a large network think once.

[og_img]

Source link