HomeEthics & SocietyUniversity of Oxford study identifies when AI hallucinations usually tend to occur

University of Oxford study identifies when AI hallucinations usually tend to occur

A University of Oxford study developed a way of testing when language models are “unsure” of their output and risk hallucinating. 

AI “hallucinations” seek advice from a phenomenon where large language models (LLMs) generate fluent and plausible responses that usually are not truthful or consistent. 

Hallucinations are tough – if not unimaginable – to separate from AI models. AI developers like OpenAI, Google, and Anthropic have all admitted that hallucinations will likely remain a byproduct of interacting with AI. 

As Dr. Sebastian Farquhar, considered one of the study’s authors, explains in a blog post, “LLMs are highly capable of claiming the identical thing in many various ways, which may make it difficult to inform after they are certain about a solution and after they are actually just making something up.” 

The Cambridge Dictionary even added an AI-related definition to the word in 2023 and named it “Word of the Year.” 

The query this University of Oxford study sought to reply is: what’s really happening under the hood when an LLM hallucinates? And how can we detect when it’s prone to occur?

The study, published in Nature, introduces an idea called “semantic entropy,” which measures the uncertainty of an LLM’s output at the extent of meaning somewhat than simply the particular words or phrases used. 

By computing the semantic entropy of an LLM’s responses, the researchers can estimate the model’s confidence in its outputs and discover instances when it’s prone to hallucinate.

Semantic entropy in LLMs

Semantic entropy, as defined by the study, measures the uncertainty or inconsistency within the meaning of an LLM’s responses. It helps detect when an LLM is likely to be hallucinating or generating unreliable information.

In simpler terms, semantic entropy measures how “confused” an LLM’s output is. The LLM will likely provide reliable information if the meanings are closely related and consistent. But if the meanings are scattered and inconsistent, it’s a red flag that the LLM is likely to be hallucinating or generating inaccurate information.

Here’s how it really works:

  1. The researchers actively prompted the LLM to generate several possible responses to the identical query. This is achieved by feeding the query to the LLM multiple times, every time with a unique random seed or slight variation within the input.
  2. Semantic entropy examines responses and groups those with the identical underlying meaning, even in the event that they use different words or phrasing.
  3. If the LLM is confident concerning the answer, its responses must have similar meanings, leading to a low semantic entropy rating. This suggests that the LLM clearly and consistently understands the data.
  4. However, if the LLM is uncertain or confused, its responses may have a greater diversity of meanings, a few of which is likely to be inconsistent or unrelated to the query. This leads to a high semantic entropy rating, indicating that the LLM may hallucinate or generate unreliable information.

The researchers applied semantic entropy to a various set of question-answering tasks to guage its effectiveness. This involved benchmarks corresponding to trivia questions, reading comprehension, word problems, and biographies. 

Across the board, semantic entropy outperformed existing methods for detecting when an LLM was prone to generate an incorrect or inconsistent answer.

Semantic entropy clusters answers with shared meanings before calculating entropy, making it suitable for language tasks where different answers can mean the identical thing. Low semantic entropy indicates the LLM’s confidence within the meaning. For longer passages, the text is decomposed into factoids, questions are generated that would yield each factoid, and the LLM generates multiple answers. Semantic entropy, including the unique factoid, is computed for every query’s answers. High average semantic entropy suggests confabulation (essentially hallucinated facts stated as real), while low entropy, despite various wording, indicates a possible true factoid. Source: Nature (open access)

You can see within the above diagram how some prompts push the LLM to generate a confabulated (inaccurate) response. For example, it produces a day and month of birth when this wasn’t provided within the initial information.

Implications of detecting hallucinations

This work can assist explain hallucinations and make LLMs more reliable and trustworthy. 

By providing a approach to detect when an LLM is uncertain or susceptible to hallucination, semantic entropy paves the best way for deploying these AI tools in high-stakes domains where factual accuracy is critical, like healthcare, law, and finance.

Erroneous results can have potentially catastrophic impacts after they influence high-stakes situations, as shown by some failed predictive policing and healthcare systems. 

However, it’s also vital to do not forget that hallucinations are only one kind of error that LLMs could make. 

As Dr. Farquhar explains, “If an LLM makes consistent mistakes, this latest method won’t catch that. The most dangerous failures of AI come when a system does something bad but is confident and systematic. There remains to be numerous work to do.”

Nevertheless, the Oxford team’s semantic entropy method represents a serious step forward in our ability to know and mitigate the restrictions of AI language models. 

Providing an objective means to detect them brings us closer to a future where we are able to harness AI’s potential while ensuring it stays a reliable and trustworthy tool within the service of humanity.


Please enter your comment!
Please enter your name here

Must Read