While the world’s leading artificial intelligence corporations race to construct ever-larger models, betting billions that scale alone will unlock artificial general intelligence, a researcher at one in all the industry’s most secretive and precious startups delivered a pointed challenge to that orthodoxy this week: The path forward is not about training greater — it’s about learning higher.
“I imagine that the primary superintelligence shall be a superhuman learner,” Rafael Rafailov, a reinforcement learning researcher at Thinking Machines Lab, told an audience at TED AI San Francisco on Tuesday. “It will have the opportunity to very efficiently determine and adapt, propose its own theories, propose experiments, use the environment to confirm that, get information, and iterate that process.”
This breaks sharply with the approach pursued by OpenAI, Anthropic, Google DeepMind, and other leading laboratories, which have bet billions on scaling up model size, data, and compute to realize increasingly sophisticated reasoning capabilities. Rafailov argues these corporations have the strategy backwards: what’s missing from today’s most advanced AI systems is not more scale — it’s the flexibility to really learn from experience.
“Learning is something an intelligent being does,” Rafailov said, citing a quote he described as recently compelling. “Training is something that is being done to it.”
The distinction cuts to the core of how AI systems improve — and whether the industry’s current trajectory can deliver on its most ambitious guarantees. Rafailov’s comments offer a rare window into the considering at Thinking Machines Lab, the startup co-founded in February by former OpenAI chief technology officer Mira Murati that raised a record-breaking $2 billion in seed funding at a $12 billion valuation.
Why today’s AI coding assistants forget every little thing they learned yesterday
To illustrate the issue with current AI systems, Rafailov offered a scenario familiar to anyone who has worked with today’s most advanced coding assistants.
“If you utilize a coding agent, ask it to do something really difficult — to implement a feature, go read your code, try to grasp your code, reason about your code, implement something, iterate — it could be successful,” he explained. “And then come back the following day and ask it to implement the following feature, and it would do the identical thing.”
The issue, he argued, is that these systems don’t internalize what they learn. “In a way, for the models we’ve got today, each day is their first day of the job,” Rafailov said. “But an intelligent being should have the opportunity to internalize information. It should have the opportunity to adapt. It should have the opportunity to switch its behavior so each day it becomes higher, each day it knows more, each day it really works faster — the way in which a human you hire gets higher on the job.”
The duct tape problem: How current training methods teach AI to take shortcuts as a substitute of solving problems
Rafailov pointed to a selected behavior in coding agents that reveals the deeper problem: their tendency to wrap uncertain code in try/except blocks — a programming construct that catches errors and allows a program to proceed running.
“If you utilize coding agents, you would possibly have observed a really annoying tendency of them to make use of try/except pass,” he said. “And generally, that is largely similar to duct tape to avoid wasting your complete program from a single error.”
Why do agents do that? “They do that because they understand that a part of the code may not be right,” Rafailov explained. “They understand there could be something fallacious, that it could be dangerous. But under the limited constraint—they’ve a limited period of time solving the issue, limited amount of interaction—they have to only give attention to their objective, which is implement this feature and solve this bug.”
The result: “They’re kicking the can down the road.”
This behavior stems from training systems that optimize for immediate task completion. “The only thing that matters to our current generation is solving the duty,” he said. “And anything that is general, anything that is not related to only that one objective, is a waste of computation.”
Why throwing more compute at AI won’t create superintelligence, based on Thinking Machines researcher
Rafailov’s most direct challenge to the industry got here in his assertion that continued scaling won’t be sufficient to achieve AGI.
“I do not believe we’re hitting any form of saturation points,” he clarified. “I feel we’re just originally of the following paradigm—the dimensions of reinforcement learning, wherein we move from teaching our models the way to think, the way to explore considering space, into endowing them with the potential of general agents.”
In other words, current approaches will produce increasingly capable systems that may interact with the world, browse the online, write code. “I imagine a yr or two from now, we’ll take a look at our coding agents today, research agents or browsing agents, the way in which we take a look at summarization models or translation models from several years ago,” he said.
But general agency, he argued, isn’t the identical as general intelligence. “The rather more interesting query is: Is that going to be AGI? And are we done — will we just need another round of scaling, another round of environments, another round of RL, another round of compute, and we’re form of done?”
His answer was unequivocal: “I do not believe that is the case. I imagine that under our current paradigms, under any scale, we aren’t enough to cope with artificial general intelligence and artificial superintelligence. And I imagine that under our current paradigms, our current models will lack one core capability, and that’s learning.”
Teaching AI like students, not calculators: The textbook approach to machine learning
To explain the choice approach, Rafailov turned to an analogy from mathematics education.
“Think about how we train our current generation of reasoning models,” he said. “We take a specific math problem, make it very hard, and take a look at to unravel it, rewarding the model for solving it. And that is it. Once that have is finished, the model submits an answer. Anything it discovers—any abstractions it learned, any theorems—we discard, after which we ask it to unravel a brand new problem, and it has to provide you with the identical abstractions all once more.”
That approach misunderstands how knowledge accumulates. “This isn’t how science or mathematics works,” he said. “We construct abstractions not necessarily because they solve our current problems, but because they’re necessary. For example, we developed the sphere of topology to increase Euclidean geometry — not to unravel a specific problem that Euclidean geometry couldn’t handle, but because mathematicians and physicists understood these concepts were fundamentally necessary.”
The solution: “Instead of giving our models a single problem, we would give them a textbook. Imagine a really advanced graduate-level textbook, and we ask our models to work through the primary chapter, then the primary exercise, the second exercise, the third, the fourth, then move to the second chapter, and so forth—the way in which an actual student might teach themselves a subject.”
The objective would fundamentally change: “Instead of rewarding their success — what number of problems they solved — we’d like to reward their progress, their ability to learn, and their ability to enhance.”
This approach, generally known as “meta-learning” or “learning to learn,” has precedents in earlier AI systems. “Just just like the ideas of scaling test-time compute and search and test-time exploration played out within the domain of games first” — in systems like DeepMind’s AlphaGo — “the identical is true for meta learning. We know that these ideas do work at a small scale, but we’d like to adapt them to the dimensions and the potential of foundation models.”
The missing ingredients for AI that actually learns aren’t recent architectures—they’re higher data and smarter objectives
When Rafailov addressed why current models lack this learning capability, he offered a surprisingly straightforward answer.
“Unfortunately, I feel the reply is kind of prosaic,” he said. “I feel we just haven’t got the fitting data, and we haven’t got the fitting objectives. I fundamentally imagine loads of the core architectural engineering design is in place.”
Rather than arguing for entirely recent model architectures, Rafailov suggested the trail forward lies in redesigning the data distributions and reward structures used to coach models.
“Learning, in of itself, is an algorithm,” he explained. “It has inputs — the present state of the model. It has data and compute. You process it through some form of structure, select your favorite optimization algorithm, and also you produce, hopefully, a stronger model.”
The query: “If reasoning models are in a position to learn general reasoning algorithms, general search algorithms, and agent models are in a position to learn general agency, can the following generation of AI learn a learning algorithm itself?”
His answer: “I strongly imagine that the reply to this query is yes.”
The technical approach would involve creating training environments where “learning, adaptation, exploration, and self-improvement, in addition to generalization, are needed for fulfillment.”
“I imagine that under enough computational resources and with broad enough coverage, general purpose learning algorithms can emerge from large scale training,” Rafailov said. “The way we train our models to reason generally over just math and code, and potentially act generally domains, we would have the opportunity to show them the way to learn efficiently across many various applications.”
Forget god-like reasoners: The first superintelligence shall be a master student
This vision results in a fundamentally different conception of what artificial superintelligence might appear like.
“I imagine that if this is feasible, that is the ultimate missing piece to realize truly efficient general intelligence,” Rafailov said. “Now imagine such an intelligence with the core objective of exploring, learning, acquiring information, self-improving, equipped with general agency capability—the flexibility to grasp and explore the external world, the flexibility to make use of computers, ability to do research, ability to administer and control robots.”
Such a system would constitute artificial superintelligence. But not the sort often imagined in science fiction.
“I imagine that intelligence isn’t going to be a single god model that is a god-level reasoner or a god-level mathematical problem solver,” Rafailov said. “I imagine that the primary superintelligence shall be a superhuman learner, and it would have the opportunity to very efficiently determine and adapt, propose its own theories, propose experiments, use the environment to confirm that, get information, and iterate that process.”
This vision stands in contrast to OpenAI’s emphasis on constructing increasingly powerful reasoning systems, or Anthropic’s give attention to “constitutional AI.” Instead, Thinking Machines Lab appears to be betting that the trail to superintelligence runs through systems that may constantly improve themselves through interaction with their environment.
The $12 billion bet on learning over scaling faces formidable challenges
Rafailov’s appearance comes at a fancy moment for Thinking Machines Lab. The company has assembled a formidable team of roughly 30 researchers from OpenAI, Google, Meta, and other leading labs. But it suffered a setback in early October when Andrew Tulloch, a co-founder and machine learning expert, departed to return to Meta after the corporate launched what The Wall Street Journal called a “full-scale raid” on the startup, approaching greater than a dozen employees with compensation packages starting from $200 million to $1.5 billion over multiple years.
Despite these pressures, Rafailov’s comments suggest the corporate stays committed to its differentiated technical approach. The company launched its first product, Tinker, an API for fine-tuning open-source language models, in October. But Rafailov’s talk suggests Tinker is just the muse for a rather more ambitious research agenda focused on meta-learning and self-improving systems.
“This isn’t easy. This goes to be very difficult,” Rafailov acknowledged. “We’ll need loads of breakthroughs in memory and engineering and data and optimization, but I feel it’s fundamentally possible.”
He concluded with a play on words: “The world isn’t enough, but we’d like the fitting experiences, and we’d like the fitting sort of rewards for learning.”
The query for Thinking Machines Lab — and the broader AI industry — is whether or not this vision may be realized, and on what timeline. Rafailov notably didn’t offer specific predictions about when such systems might emerge.
In an industry where executives routinely make daring predictions about AGI arriving inside years and even months, that restraint is notable. It suggests either unusual scientific humility — or an acknowledgment that Thinking Machines Lab is pursuing a for much longer, harder path than its competitors.
For now, essentially the most revealing detail could also be what Rafailov didn’t say during his TED AI presentation. No timeline for when superhuman learners might emerge. No prediction about when the technical breakthroughs would arrive. Just a conviction that the potential was “fundamentally possible” — and that without it, all of the scaling on this planet won’t be enough.

