Stay up up to now with free updates
Simply log in Artificial intelligence myFT Digest – delivered straight to your inbox.
The lack of internal reasoning skills – in other words, considering – has long been considered one among the important thing weaknesses of artificial intelligence. The extent of a recent advance on this area by ChatGPT inventor OpenAI is a point of debate throughout the scientific community. However, a lot of my fellow experts and I consider that there’s a probability that we’re on the verge of closing the gap on human-level considering.
Researchers have long argued that traditional neural networks — the leading approach to AI — are more consistent with “System 1” cognition. This corresponds to answering questions directly or intuitively (e.g. in automatic facial recognition). On the opposite hand, human intelligence can be based on “System 2” cognition. This involves internal reflection and enables effective types of considering (e.g. when solving a math problem or planning something intimately). It allows us to mix pieces of data in a coherent but novel way.
OpenAI's progress, which has not yet been fully released to the general public, is predicated on a type of AI that has been considered internally with their o1 large language model (LLM).
Better reasoning would address two major weaknesses of current AI: lack of coherence of answers and the flexibility to plan and achieve long-term goals. The former is significant for scientific purposes and the latter is crucial for creating autonomous agents. Both could enable vital applications.
The principles of thought were at the center of AI research within the twentieth century. An early example of success was DeepMind's AlphaGo, the primary computer system to beat human champions in the traditional Asian game of Go in 2015, and more recently AlphaProof, which works on mathematical topics. Neural networks learn to predict the advantages of an motion. Such “intuitions” are then used for planning by efficiently looking for possible sequences of actions.
However, AlphaGo and AlphaProof require very specific knowledge (in regards to the game of Go or certain mathematical areas). It stays unclear how the extensive knowledge of contemporary LLMs may be combined with strong argumentation and planning skills.
There has been some progress. LLM students already find higher answers to complex questions once they are asked to create a series of thought that results in their answer.
OpenAI's latest “o” series pushes this concept further and requires much more computing resources and subsequently energy. With a really long chain of thoughts, it’s trained to “think” higher.
We thus see a brand new type of computational scaling. Not only more training data and bigger models, but additionally more time to “think” about answers. This results in significantly improved abilities in reasoning-intensive tasks corresponding to mathematics, computer science and science more broadly.
For example, while OpenAI's previous model GPT-4o only achieved about 13 percent within the 2024 United States Mathematics Olympiad (within the AIME test), o1 achieved an 83 percent mark, placing it among the many top 500 students within the country.
If successful, there are major risks to contemplate. We don’t yet know the way we will reliably align and control AI. For example, evaluation of o1 showed an increased ability to deceive people – a natural consequence of improving goal achievement skills. It can be concerning that o1's ability to contribute to the production of biological weapons has exceeded OpenAI's risk threshold from low to medium. This is the best acceptable level, in response to the corporate (which can have an interest in minimizing concerns).
Opening up reasoning and freedom of selection are believed to be the important thing milestones on the trail to human-level AI, also often called artificial general intelligence. There are subsequently strong economic incentives for giant corporations pursuing this goal to compromise on security.
o1 should only be a primary step. Although it performs well on many reasoning and math tasks, it looks like long-term planning has still not been achieved. o1 has difficulty with more complex planning tasks, suggesting that there remains to be much work to be done to realize the type of autonomous motion sought by AI corporations.
But with improved programming and scientific capabilities, it is predicted that these latest models could speed up research into AI itself. This could lead on to human-level intelligence faster than expected. Advances in considering skills make it much more urgent to manage AI models to guard the general public.