HomeArtificial IntelligenceWhy RAG won't solve the hallucination problem of generative AI

Why RAG won't solve the hallucination problem of generative AI

Hallucinations – essentially the lies that generative AI models spread – are a serious problem for corporations trying to integrate the technology into their operations.

Because models haven’t any real intelligence and merely predict words, images, speech, music, and other data in line with a non-public scheme, they’re sometimes improper. Very improper. In a recent article within the Wall Street Journal, a source tells of a case during which Microsoft's generative AI invented meeting participants and suggested that conference calls were about topics that weren’t actually discussed within the conference call.

As I wrote a while ago, hallucinations could also be an intractable problem with today's transformer-based model architectures. However, various generative AI providers are proposing to kind of eliminate this through a technical approach called Retrieval Augmented Generation (RAG).

So a salesman, Squirro, raises it:

At the guts of the offering is the concept of Retrieval Augmented LLMs or Retrieval Augmented Generation (RAG) embedded in the answer… (our generative AI) is exclusive in its promise of not causing hallucinations. Every information generated might be traced back to a source, which ensures credibility.

Here is a similar pitch from SiftHub:

Using RAG technology and fine-tuned large language models with industry-specific knowledge training, SiftHub enables corporations to generate personalized answers without hallucinations. This guarantees increased transparency and reduced risk, instilling absolute confidence in using AI for all their needs.

RAG was founded by data scientist Patrick Lewis, researcher at Meta and University College London and 2020 lead writer of the yr Paper that coined the term. When applied to a model, RAG retrieves documents which might be potentially relevant to a matter—for instance, a Wikipedia page concerning the Super Bowl—essentially using a keyword search after which asking the model to generate answers based on that additional context .

“When you interact with a generative AI model like ChatGPT or Llama and ask a matter, the model responds by default from its 'parametric memory' – that’s, from the knowledge stored in its parameters because of this.” Training with huge amounts of information the Internet,” said David Wadden, research scientist at AI2, the AI-focused research arm of the nonprofit Allen Institute. “But just as you might be likely to provide more accurate answers when you might have a reference (e.g. a book or a file) in front of you, the identical is true for models in some cases.”

RAG is undeniably useful – it allows things a model generates to be mapped to retrieved documents to confirm their factuality (and, as an additional advantage, avoid potentially copyright-infringing regurgitation). RAG also allows corporations that don't want their documents for use to coach a model – for instance, corporations in highly regulated industries like healthcare and legal – to permit models to access those documents in a safer and temporary way.

But RAG definitely prevents a model from hallucinating. And there are limitations that many providers gloss over.

Wadden says RAG is simplest in “knowledge-intensive” scenarios where a user wants to make use of a model to fulfill an “information need” — for instance, checking out who won the Super Bowl last yr. In these scenarios, the document answering the query likely accommodates most of the same keywords because the query (e.g., “Super Bowl,” “last yr”), making it relatively easy to search out using keyword search.

Things get trickier for “reasoning-intensive” tasks like coding and math, where it’s harder to specify in a keyword-based search query the concepts needed to reply a question – let alone determine which documents is perhaps relevant.

Even for easy questions, models might be “distracted” by irrelevant content in documents, especially long documents where the reply just isn’t obvious. Or, for reasons still unknown, they could simply ignore the contents of the retrieved documents and rely as a substitute on their parametric memory.

RAG can also be expensive when it comes to the hardware required for large-scale application.

Retrieved documents, be they from the Internet, an internal database or elsewhere, should be stored – not less than temporarily – in memory in order that the model can access them. Another effort is computing the prolonged context that a model must process before generating its response. For a technology already known for requiring loads of processing power and power for even basic operations, this can be a serious consideration.

That doesn't mean RAG can't be improved. Wadden pointed to many ongoing efforts to coach models to make higher use of the documents retrieved from RAG.

Some of those efforts include models that may “resolve” when to make use of the documents, or models that may select to not perform the retrieval in any respect in the event that they deem it unnecessary. Others deal with ways to index large document datasets more efficiently and on improving search through higher representations of documents—representations that transcend keywords.

“We're pretty good at retrieving documents based on keywords, but not so good at retrieving documents based on more abstract concepts like a proof technique needed to resolve a math problem,” Wadden said. “Research is required to develop document representations and search techniques that may discover relevant documents for more abstract generation tasks. I believe that’s largely an open query at this point.”

So RAG may also help reduce a model's hallucinations – however it's not the reply to all of AI's hallucinatory problems. Beware of providers who claim otherwise.


Please enter your comment!
Please enter your name here

Must Read