LLMs are really bad at solving easy river crossing puzzles

June 25, 2024

168

Large language models like GPT-4o can perform incredibly complex tasks, but even the most effective models struggle with some basic reasoning tasks that children can solve.

In an interview with CBS, the “Godfather of AI,” Geoffrey Hinton, said that AI systems could possibly be more intelligent than we expect and that there’s a likelihood that the machines will take over.

When asked concerning the current state of AI technology, Hinton said, “I feel we’re moving toward a time where there could also be things which are more intelligent than us for the primary time ever.”

Yann LeCun, Meta’s chief AI scientist, would really like us to imagine that AI continues to be a great distance from reaching even the intelligence of a “dog.”

So what’s it?

This week, users on X have posted examples of the incredible coding capability of Anthropic’s latest Claude Model exhibitions. Others conducted experiments to reveal that AI models still struggle with very basic reasoning.

River Crossing Puzzle

The classic river crossing puzzle has several variations, but Wikipedia version sums it up like this:

A farmer with a wolf, a goat, and a cabbage must cross a river in a ship. The boat can only carry the farmer and a single object. If they were left together unattended, the wolf would eat the goat or the goat would eat the cabbage. How can they cross the river without anything getting eaten?

Finding the answer requires some basic planning and interested by different scenarios. However, it isn’t a very difficult problem. If you might be human.

Can GPT-4o solve it? If you copy and paste the puzzle into ChatGPT you get the right answer, but this Wikipedia page was almost actually in its training data.

What if we made the puzzle much simpler and modified it barely in order that the LLM couldn’t depend on its training data?

British mathematics professor Sir William Timothy Gowers showed how easily the lack of LLMs to use logic may be exposed.

ChatGPT's failed try to solve a simplified river crossing puzzle. Source: X @wtgowers

The correct answer to the puzzle is that just one trip is required. But it looks as if ChatGPT is trying to recollect a solution slightly than simply solving the puzzle.

Is Claude Sonnet 3.5 higher?

Metadata scientist Colin Fraser's experiment confirms that even the present leading AI model cannot solve this easy puzzle.

Claude still can’t solve the unattainable problem “one farmer, one sheep, one boat” pic.twitter.com/TU13wermLZ

— Colin Fraser (@colin_fraser) 20 June 2024

It was perhaps a bit disingenuous of an information scientist at Meta not to indicate his results using Llama 3.

I asked Meta AI the identical query and in addition they got it weirdly flawed.

The Meta AI supported by Llama 3 also solves the river puzzle incorrectly. Source: Meta

Yann LeCun explained the rationale for these results as follows: “The problem is that graduates of the LLM program haven’t any common sense, no understanding of the world and no ability to plan (and think).”

Is that true, or is there something else behind it?

Rather than revealing an absence of reasoning ability, these interactions may reveal how strongly an LLM's results are influenced by its training data. Meta AI's response, calling this a “classic puzzle,” suggests that this stands out as the case.

The variations of river crossing puzzles often confer with the variety of “trips” required. If you ask the puzzle without using that word, the LLM will solve it.

Indeed. If there is no such thing as a “travel” prompt, which brings back memories of the previous solutions of so many similar problems, however the “fastest possible path” prompt together with COT, it appropriately answers pic.twitter.com/E27vBv2y2R

— AnKo (@anko_979) 21 June 2024

These experiments were interesting, but don’t provide a definitive answer to the query of whether AI models are truly intelligent or just prediction machines that react to the subsequent token.

However, the outcomes show how vulnerable LLMs are to training data. If GPT-4o ace the LSAT exams, does one need to “think” to search out the answers to the issues, or do they need to be memorized?

As long as engineers don’t understand what is occurring within the AI black boxes they’ve created, the debates about X will remain unresolved.

LLMs are really bad at solving easy river crossing puzzles

River Crossing Puzzle

LEAVE A REPLY Cancel reply

Must Read

Sam Altman leaves OpenAI's security committee

Aversion to algorithmic analysts

Uniphore introduces X-Stream, a unified knowledge offering to construct RAG apps 8x faster

Confusion in talks with top brands about an ad model that challenges Google

Google's NotebookLM continues to evolve: What IT managers must know concerning the enterprise applications

Microsoft to restart Three Mile Island nuclear power plant under exclusive contract

GreenLite, founded by a former Gopuff manager, automates constructing permits

Latest articles

Sam Altman leaves OpenAI's security committee

Aversion to algorithmic analysts

Uniphore introduces X-Stream, a unified knowledge offering to construct RAG apps 8x faster

Our Newsletter

LLMs are really bad at solving easy river crossing puzzles

River Crossing Puzzle

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter