Generative AI systems, hallucinations, and mounting technical debt

February 27, 2024

80

As AI systems like large language models (LLMs) grow in size and complexity, researchers are uncovering intriguing fundamental limitations.

Recent studies from Google and the University of Singapore have uncovered the mechanics behind AI “hallucinations” – where models generate convincing but fabricated information – and the buildup of “technical debt,” which could create messy, unreliable systems over time.

Beyond the technical challenges, aligning AI’s capabilities and incentives with human values stays an open query.

As corporations like OpenAI push towards artificial general intelligence (AGI), securing the trail ahead means acknowledging the boundaries of current systems.

However, fastidiously acknowledging risks is antithetical to Silicon Valley’s motto to “move fast and break things,” which characterizes AI R&D because it did for tech innovations before it.

Study 1: AI models are accruing ‘technical debt’

Machine learning is usually touted as constantly scalable, with systems offering a modular, integrated framework for development.

However, within the background, developers could also be accruing a high level of ‘technical debt’ they’ll need to unravel down the road.

In a Google research paper, “Machine Learning: The High-Interest Credit Card of Technical Debt,” researchers discuss the concept of technical debt within the context of ML systems.

Kaggle CEO and long-time Google researcher D. Sculley and colleagues argue that while ML offers powerful tools for rapidly constructing complex systems, these “quick wins” are sometimes misleading.

The simplicity and speed of deploying ML models can mask the long run burdens they impose on system maintainability and evolution.

As the authors describe, this hidden debt arises from several ML-specific risk aspects that developers should avoid or refactor.

Here are the important thing insights:

ML systems, by their nature, introduce a level of complexity beyond coding alone. This can result in what the authors call “boundary erosion,” where the clear lines between different system components turn into blurred as a result of the interdependencies created by ML models. This makes it difficult to isolate and implement improvements without affecting other parts of the system.
The paper also highlights the issue of “entanglement,” where changes to any a part of an ML system, equivalent to input features or model parameters, can have unpredictable effects on the remaining of the system. Altering one small parameter might instigate a cascade of effects that impacts a whole model’s function and integrity.
Another issue is the creation of “hidden feedback loops,” where ML models influence their very own training data in unexpected ways. This can result in systems that evolve in unintended directions, compounding the problem of managing and understanding the system’s behavior.
The authors also address “data dependencies,” equivalent to where input signals change over time, that are particularly problematic as they’re harder to detect.

Why technical debt matters

Technical doubt touches on the long-term health and efficiency of ML systems.

When developers rush to get ML systems up and running, they may ignore the messy intricacies of information handling or the pitfalls of ‘gluing’ together different parts.

This might work within the short term but can result in a tangled mess that’s hard to dissect, update, and even understand later.

⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️

GenAI is an avalanche of technical debt* waiting to occur

Just this week
👉ChatGPT went “berserk” with almost no real explanation
👉Sora can’t consistently infer what number of legs a cat has
👉Gemini’s diversity intervention went utterly off the rails.… pic.twitter.com/qzrVlpX9yz

For example, using ML models as-is from a library seems efficient until you’re stuck with a “glue code” nightmare, where many of the system is just duct tape holding together bits and pieces that weren’t meant to suit together.

Or consider “pipeline jungles,” described in a previous paper by D. Sculley and colleagues, where data preparation becomes a labyrinth of intertwined processes, so making a change seems like defusing a bomb.

The implications of technical debt

For starters, the more tangled a system becomes, the harder it’s to enhance or maintain it. This not only stifles innovation but can even result in more sinister issues.

For instance, if an ML system starts making decisions based on outdated or biased data since it’s too cumbersome to update, it may well reinforce or amplify societal biases.

Moreover, in critical applications like healthcare or autonomous vehicles, such technical debt could have dire consequences, not only when it comes to money and time but in human well-being.

As the study describes, “Not all debt is necessarily bad, but technical debt does are inclined to compound. Deferring the work to pay it off leads to increasing costs, system brittleness, and reduced rates of innovation.”

It’s also a reminder for businesses and consumers to demand transparency and accountability within the AI technologies they adopt.

After all, the goal is to harness the ability of AI to make life higher, to not get bogged down in an limitless cycle of technical debt repayment.

Study 2: You can’t separate hallucinations from LLMs

In a special but related study from the National University of Singapore, researchers Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli investigated the inherent limitations of LLMs.

“Hallucination is Inevitable: An Innate Limitation of Large Language Models” explores the character of AI hallucinations, which describe instances when AI systems generate plausible but inaccurate or entirely fabricated information.

The hallucination phenomena pose a serious technical challenge, because it highlights a fundamental gap between the output of an AI model and what is taken into account the “ground truth” – a great model that all the time produces correct and logical information.

Understanding how and why generative AI hallucinates is paramount because the technology integrates into critical sectors equivalent to policing and justice, healthcare, and legal.

What if one could *prove* that hallucinations are inevitable inside LLMs?

Would that change
• How you view LLMs?
• How much investment you’ll make in them?
• How much you’ll prioritize research in alternatives?

New paper makes the case: https://t.co/r0eP3mFxQg
h/t… pic.twitter.com/Id2kdaCSGk

Theoretical foundations of hallucinations

The study begins by laying out a theoretical framework to know hallucinations in LLMs.

Researchers created a theoretical model often called the “formal world.” This simplified, controlled environment enabled them to watch the conditions under which AI models fail to align with the bottom truth.

They then tested two major families of LLMs:

Llama 2: Specifically, the 70-billion-parameter version (llama2-70b-chat-hf) accessible on HuggingFace was used. This model represents certainly one of the newer entries into the big language model arena, designed for a wide selection of text generation and comprehension tasks.
Generative Pretrained Transformers (GPT): The study included tests on GPT-3.5, specifically the 175-billion-parameter gpt-3.5-turbo-16k model, and GPT-4 (gpt-4-0613), for which the precise variety of parameters stays undisclosed.

LLMs were asked to list strings of a given length using a specified alphabet, a seemingly easy computational task.

More specifically, the models were tasked with generating all possible strings of lengths various from 1 to 7, using alphabets of two characters (e.g., {a, b}) and three characters (e.g., {a, b, c}).

The outputs were evaluated based on whether or not they contained all and only the strings of the desired length from the given alphabet.

Findings

The results showed a transparent limitation within the models’ abilities to finish the duty accurately because the complexity increased (i.e., because the string length or the alphabet size increased). Specifically:

The models performed adequately for shorter strings and smaller alphabets but faltered because the task’s complexity increased.
Notably, even the advanced GPT-4 model, essentially the most sophisticated LLM available right away, couldn’t successfully list all strings beyond certain lengths.

This shows that hallucinations aren’t a straightforward glitch that might be patched or corrected – they’re a fundamental aspect of how these models understand and replicate human language.

As the study describes, “LLMs cannot learn all of the computable functions and can subsequently all the time hallucinate. Since the formal world is an element of the actual world which is way more complicated, hallucinations are also inevitable for real world LLMs.”

The implications for high-stakes applications are vast. In sectors like healthcare, finance, or law, where the accuracy of data can have serious consequences, counting on an LLM and not using a fail-safe to filter out these hallucinations could lead on to serious errors.

This study caught the attention of AI expert Dr. Gary Marcus and eminent cognitive psychologist Dr. Steven Pinker.

Hallucination is inevitable with Large Language Models due to their design: no representation of facts or things, just statistical intercorrelations. New proof of “an innate limitation” of LLMs. https://t.co/Hl1kqxJGXt

Deeper issues are at play

The accumulation of technical debt and the inevitability of hallucinations in LLMs are symptomatic of a deeper issue — the present paradigm of AI development could also be inherently misaligned to create very smart systems and reliably aligned with human values and factual truth.

In sensitive fields, having an AI system that’s right more often than not just isn’t enough. Technical debt and hallucinations each threaten model integrity over time.

Fixing this isn’t only a technical challenge but a multidisciplinary one, requiring input from AI ethics, policy, and domain-specific expertise to navigate safely.

Right now, that is seemingly at odds with the principles of an industry living as much as the motto to “move fast and break things.”

Let’s hope humans aren’t the ‘things.’

Generative AI systems, hallucinations, and mounting technical debt

Study 1: AI models are accruing ‘technical debt’

Why technical debt matters

The implications of technical debt

Study 2: You can’t separate hallucinations from LLMs

Theoretical foundations of hallucinations

Findings

Deeper issues are at play

LEAVE A REPLY Cancel reply

Must Read

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Contactless stores set to grow in Europe as Sensei rakes in one other $16 million

AI search start-up Perplexity is targeting an $8 billion valuation in a brand new round of funding

Socket receives recent $40 million to scan software for security vulnerabilities

Cohere adds a vision to its RAG search capabilities

Latest articles

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Our Newsletter

Generative AI systems, hallucinations, and mounting technical debt

Study 1: AI models are accruing ‘technical debt’

Why technical debt matters

The implications of technical debt

Study 2: You can’t separate hallucinations from LLMs

Theoretical foundations of hallucinations

Findings

Deeper issues are at play

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter