If you've ever tried using ChatGPT as a calculator, you've almost definitely noticed this: the chatbot is bad at math. And it's not unique amongst AI on this regard.
Anthropics Claude Cannot be solved Basic word problems. Gemini doesn't understand it quadratic equations. And metas lama has problems with straightness Addition.
So how is it that these bots can write soliloquies and yet still stumble at elementary school level arithmetic?
Tokenization has something to do with it. Through the technique of dividing data into chunks (e.g., splitting the word “incredible” into the syllables “fan,” “tas,” and “tic”), tokenization helps AI densely encode information. But because tokenizers – the AI models that do the tokenization – don't really know what numbers are, they often find yourself at the underside destroy the relationships between digits. For example, a tokenizer might treat the number “380” as a token, but represent “381” as a pair of digits (“38” and “1”).
But tokenization isn't the one reason why math is a weak point for AI.
AI systems are statistical machines. Using many examples, they learn the patterns in these examples to make predictions (e.g. that the phrase “to whom” in an email often precedes the phrase “it might be affected”). For example, when considering the multiplication problem 5.7897 x 1.2832, ChatGPT – having seen many multiplication problems – will probably get the product of a number ending in “7” and a number ending in “2” with ” end”. 4.” But things get difficult with the center part. ChatGPT gave me the reply 742,021,104; the right number is 742,934,304.
Yuntian Deng, an assistant professor on the University of Waterloo who focuses on AI, thoroughly evaluated ChatGPT's multiplication capabilities in a study earlier this 12 months. He and co-authors found that the usual GPT-4o model had difficulty multiplying beyond two numbers that every contained greater than 4 digits (e.g., 3,459 x 5,284).
“GPT-4o has problems with multi-digit multiplication, achieving lower than 30% accuracy on four-digit by four-digit problems,” Deng told TechCrunch. “Multi-digit multiplication presents a challenge for language models because an error in an intermediate step can compound and result in incorrect final results.”
So will math skills escape ChatGPT ceaselessly? Or is there reason to consider that the bot could someday handle numbers in addition to humans (or a TI-84)?
Deng is hopeful. In the study, he and his colleagues also tested o1, OpenAI's “reasoning” model that was recently introduced at ChatGPT. The o1, which “thinks through” problems step-by-step before answering them, performed significantly better than GPT-4o, solving nine-digit by nine-digit multiplication problems in about half the time.
“The model may solve the issue otherwise than we solve it manually,” Deng said. “It makes us interested by the inner approach of the model and the way it differs from human pondering.”
Deng believes the advances suggest that at the very least some kinds of math problems – including multiplication problems – will eventually be “completely solved” by ChatGPT-like systems. “This is a well-defined task with well-known algorithms,” Deng said. “We are already seeing significant improvements from GPT-4o to o1, so it is obvious that improvements in reasoning skills are happening.”
Just don't throw away your calculator so quickly.