Why is ChatGPT so bad at math?

October 3, 2024

147

If you've ever tried using ChatGPT as a calculator, you've almost definitely noticed this: the chatbot is bad at math. And it's not unique amongst AI on this regard.

Anthropics Claude Cannot be solved Basic word problems. Gemini doesn't understand it quadratic equations. And metas lama has problems with straightness Addition.

So how is it that these bots can write soliloquies and yet still stumble at elementary school level arithmetic?

Tokenization has something to do with it. Through the technique of dividing data into chunks (e.g., splitting the word “incredible” into the syllables “fan,” “tas,” and “tic”), tokenization helps AI densely encode information. But because tokenizers – the AI models that do the tokenization – don't really know what numbers are, they often find yourself at the underside destroy the relationships between digits. For example, a tokenizer might treat the number “380” as a token, but represent “381” as a pair of digits (“38” and “1”).

But tokenization isn't the one reason why math is a weak point for AI.

AI systems are statistical machines. Using many examples, they learn the patterns in these examples to make predictions (e.g. that the phrase “to whom” in an email often precedes the phrase “it might be affected”). For example, when considering the multiplication problem 5.7897 x 1.2832, ChatGPT – having seen many multiplication problems – will probably get the product of a number ending in “7” and a number ending in “2” with ” end”. 4.” But things get difficult with the center part. ChatGPT gave me the reply 742,021,104; the right number is 742,934,304.

Yuntian Deng, an assistant professor on the University of Waterloo who focuses on AI, thoroughly evaluated ChatGPT's multiplication capabilities in a study earlier this 12 months. He and co-authors found that the usual GPT-4o model had difficulty multiplying beyond two numbers that every contained greater than 4 digits (e.g., 3,459 x 5,284).

“GPT-4o has problems with multi-digit multiplication, achieving lower than 30% accuracy on four-digit by four-digit problems,” Deng told TechCrunch. “Multi-digit multiplication presents a challenge for language models because an error in an intermediate step can compound and result in incorrect final results.”

Is o1 from OpenAI a superb computer? We tested it on as much as 20×20 multiplications – o1 solves as much as 9×9 multiplications with reasonable accuracy, while gpt-4o struggles on greater than 4×4. For context, this task is solvable by a small LM using implicit CoT with gradual internalization. 1/4 pic.twitter.com/et5DB9bhNL

— Yuntian Deng (@yuntiandeng) September 17, 2024

So will math skills escape ChatGPT ceaselessly? Or is there reason to consider that the bot could someday handle numbers in addition to humans (or a TI-84)?

Deng is hopeful. In the study, he and his colleagues also tested o1, OpenAI's “reasoning” model that was recently introduced at ChatGPT. The o1, which “thinks through” problems step-by-step before answering them, performed significantly better than GPT-4o, solving nine-digit by nine-digit multiplication problems in about half the time.

“The model may solve the issue otherwise than we solve it manually,” Deng said. “It makes us interested by the inner approach of the model and the way it differs from human pondering.”

Deng believes the advances suggest that at the very least some kinds of math problems – including multiplication problems – will eventually be “completely solved” by ChatGPT-like systems. “This is a well-defined task with well-known algorithms,” Deng said. “We are already seeing significant improvements from GPT-4o to o1, so it is obvious that improvements in reasoning skills are happening.”

Just don't throw away your calculator so quickly.

Why is ChatGPT so bad at math?

LEAVE A REPLY Cancel reply

Must Read

Perplexity's Carbon integration will make it easier for corporations to attach their data with AI search

Salesforce drops Agentforce 2.0 and brings inferential AI to businesses

Instagram introduces AI tools for editing appearances and backgrounds in videos using command prompts

The recent tool from the AI startup Odyssey can generate photorealistic 3D worlds

Big language overkill: How SLMs can beat their larger, resource-intensive cousins

Decart acquires $32 million at a valuation of over $500 million to develop AI technology and “open world” apps

Backed by a16z and NEA, Backflip raises $30 million in Series A to rework text into AI-generated designs

Latest articles

Perplexity's Carbon integration will make it easier for corporations to attach their data with AI search

Salesforce drops Agentforce 2.0 and brings inferential AI to businesses

Instagram introduces AI tools for editing appearances and backgrounds in videos using command prompts

Our Newsletter

Why is ChatGPT so bad at math?

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter