HomeArtificial IntelligenceMicrosoft claims its latest tool can correct AI hallucinations, but experts advise...

Microsoft claims its latest tool can correct AI hallucinations, but experts advise caution

AI is a notorious liar and Microsoft now says it has an answer. Understandably, this can raise some eyebrows, but there’s reason to be skeptical.

Microsoft today introduced Correction, a service that attempts to mechanically correct AI-generated text that’s factually inaccurate. Correction first flags erroneous text—for instance, a summary of an organization's quarterly conference call where quotes can have been misattributed—after which checks it for accuracy by comparing the text to a sound source (resembling transcripts).

Correction, available as a part of Microsoft's Azure AI Content Safety API, will be used with any text-generating AI model, including Meta's Llama and OpenAI's GPT-4o.

“The correction is made possible by a brand new process that uses small and enormous language models to check the outcomes with the underlying documents,” a Microsoft spokesperson told TechCrunch. “We hope this latest feature will support developers and users of generative AI in areas resembling medicine, where application developers place significant importance on the accuracy of the answers.”

Google this summer introduced an identical feature in Vertex AI, its AI development platform, that permits customers to “ground” models using third-party data, their very own datasets or Google Search.

However, experts caution that these grounding approaches don’t address the actual reason behind the hallucinations.

“Trying to eliminate hallucinations from generative AI is like attempting to eliminate hydrogen from water,” said Os Keyes, a doctoral student on the University of Washington who studies the moral implications of recent technologies. “It's a necessary a part of how the technology works.”

Text-generating models hallucinate because they don't actually “know” anything. They are statistical systems that recognize patterns in a series of words and predict what words will come next based on the countless examples they’ve been trained with.

This signifies that a model’s answers will not be answers, but merely predictions about how a matter can be answered if it were present within the training set. As a result, models are inclined to not be too precise with the reality. One study found that OpenAI’s ChatGPT answers medical questions incorrectly half of the time.

Microsoft's solution consists of two metamodels with cross-references and a sort of text editor that serve to spotlight and rewrite hallucinations.

A classifier model looks for potentially false, fabricated, or irrelevant snippets of the AI-generated text (hallucinations). If the classifier detects hallucinations, it resorts to a second model, a language model, which attempts to correct the hallucinations based on certain “ground documents.”

Photo credits: Microsoft

“Corrections can significantly improve the reliability and trustworthiness of AI-generated content by helping application developers reduce user dissatisfaction and potential reputational risks,” the Microsoft spokesperson said. “It's necessary to notice that groundedness detection will not be an 'accuracy' issue, but fairly helps align generative AI outputs with groundedness documents.”

Keyes has doubts about this.

“It might alleviate some problems,” they said, “but it can also create latest problems. After all, Correction's hallucination detection library can also be likely able to inducing hallucinations.”

When asked for background information on the correction models, the spokesman referred to a recent Paper from a Microsoft research team describing the models' pre-production architectures. However, the document lacks necessary details, resembling which datasets were used to coach the models.

Mike Cook, an AI researcher at Queen Mary University, argues that even when Correction works as advertised, it could exacerbate the trust and explainability issues surrounding AI. While the service might catch some errors, it could also lull users right into a false sense of security – making them consider the models are more accurate than they really are.

“Microsoft, like OpenAI and Google, created this problem where you depend on models in scenarios where they are sometimes incorrect,” he said. “What Microsoft is doing now could be repeating the error at the next level. Let's say that takes us from 90% certainty to 99% – the issue was never really that 9%. It's all the time going to be that 1% of mistakes that we don't catch yet.”

Cook added that there’s also a cynical business aspect to Microsoft's bundling of Correction. The feature alone is free, however the “grounding detection” needed to detect hallucinations so Correction can correct them is simply release to five,000 “text records” per 30 days. After that, it costs 38 cents per 1,000 text records.

Microsoft is undoubtedly under pressure to prove to its customers – and shareholders – that its AI is definitely worth the investment.

In the second quarter alone ploughed nearly $19 billion in investments and equipment, most of that are related to AI. Yet the corporate has yet to generate significant revenue from AI. A Wall Street analyst this week downgraded the corporate's shares, citing doubts about its long-term AI strategy.

According to a Piece The Information says that many early adopters have paused deployments of Microsoft's flagship generative AI platform, Microsoft 365 Copilot, because of performance and value concerns. For one customer using Copilot for Microsoft Teams meetings, The AI ​​allegedly invented the participants and implied that the calls were about topics that were never actually discussed.

Accuracy and the potential for hallucinations are amongst the largest concerns firms have when piloting AI tools today. in keeping with a KPMG survey.

“If this were a traditional product life cycle, generative AI would still be in academic research and development, working to enhance it and understand its strengths and weaknesses,” Cook said. “Instead, we’ve got it deployed in a dozen industries. Microsoft and others have loaded everyone into their exciting latest spaceship and judge to construct the landing gear and parachutes on the method to their destination.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read