Large language models (LLMs) like ChatGPT and Claude have turn into commonplace words around the globe. Many people have began to worry about this AI is coming for his or her jobsTherefore, it’s ironic that nearly all LLM-based systems fail at a walk in the park: counting the “r”s within the word “strawberry.” They don't just fail due to the alphabet “r”; Other examples include counting “m”s in “mammal” and “p”s in “hippopotamus.” In this text, I'll break down the explanations for these errors and offer a straightforward workaround.
LLMs are powerful AI systems which might be trained on massive amounts of text to know and produce human-like language. They excel at tasks akin to answering questions, translating languages, summarizing content, and even creating creative texts by predicting and constructing coherent answers based on the input they receive. LLMs are designed to acknowledge patterns in text, allowing them to tackle a big selection of language-related tasks with impressive accuracy.
Despite their abilities, the failure to count the variety of “r’s” within the word “strawberry” is a reminder that LLMs are incapable of “considering” like humans. They don't process the data we give them like a human would.
Almost all current high-performance LLMs are based on it Transformers. This deep learning architecture doesn’t take text directly as input. They use a process called Tokenizationwhich converts the text into numerical representations or tokens. Some tokens will be complete words (like “monkey”), while others will be parts of a word (like “mon” and “key”). Each token is sort of a code that the model understands. By tokenizing every little thing, the model can higher predict the subsequent token in a set.
LLMs don't memorize words; They try to know how these tokens fit together in alternative ways in order that they will make a superb guess at what comes next. In the case of the word “hippopotamus,” the model may even see the letter mixtures “hip,” “pop,” “o,” and “tamus” and never know that the word “hippopotamus” is made up of the letters “hippopotamus.” h”, “i”, “p”, “p”, “o”, “p”, “o”, “t”, “a”, “m”, “u”, “s”.
A model architecture that may take a look at individual letters directly without tokenizing them may not have this problem, but for today's transformer architectures it is just not computationally feasible.
Let's also take a look at how LLMs generate output text: You predict What the subsequent word might be based on the previous input and output tokens. While this works for generating context-aware, human-like text, it is just not suitable for easy tasks like counting letters. When LLMs are asked to reply the variety of “r”s within the word “Strawberry,” they simply predict the reply based on the structure of the input sentence.
Here is a workaround
While LLMs may not give you the option to “think” or reason logically, they’re good at understanding structured text. An excellent example of structured text is computer code from many, many programming languages. If we ask ChatGPT to make use of Python to count the variety of “r”s in “strawberry”, it would almost definitely get the proper answer. When LLMs have to perform counting or other tasks which will require logical reasoning or arithmetic calculations, the more comprehensive software will be designed in order that the prompts include asking the LLM to make use of a programming language to process the input query.
Diploma
A straightforward letter counting experiment reveals a fundamental limitation of LLMs like ChatGPT and Claude. Despite their impressive ability to generate human-like text, write code, and answer any questions asked of them, these AI models cannot yet “think” like a human. The experiment shows the models for what they’re: pattern-matching prediction algorithms and never an “intelligence” able to understanding or reasoning. However, the issue will be alleviated to some extent by knowing prematurely what sorts of prompts work well. As the combination of AI into our lives increases, recognizing its limitations is critical to responsible use and realistic expectations of those models.