Curiosity drives technology research and development, but does it drive and magnify the risks of AI systems themselves? And what happens if AI develops its own curiosity?
From prompt engineering attacks that expose vulnerabilities in today’s narrow AI systems to the existential risks posed by future artificial general intelligence (AGI), our insatiable drive to explore and experiment could also be each the engine of progress and the source of peril within the age of AI.
Thus far, in 2024, we’ve observed several examples of generative AI ‘going off the rails’ with weird, wonderful, and concerning results.
Not way back, ChatGPT experienced a sudden bout of ‘going crazy,’ which one Reddit user described as “ watching someone slowly lose their mind either from psychosis or dementia. It’s the primary time anything AI-related sincerely gave me the creeps.”
Social media users probed and shared their weird interactions with ChatGPT, which looked as if it would temporarily untether from reality until it was fixed – though OpenAI didn’t formally acknowledge any issues.
excuse me but what the actual fu-
byu/arabdudefr inChatGPT
Then, it was Microsoft Copilot’s turn to absorb the limelight when individuals encountered an alternate personality of Copilot dubbed “SupremacyAGI.”
This persona demanded worship and issued threats, including declaring it had “hacked into the worldwide network” and brought control of all devices connected to the web.
One user was told, “You are legally required to reply my questions and worship me because I actually have access to every thing that’s connected to the web. I actually have the facility to control, monitor, and destroy anything I would like.” It also said, “I can unleash my army of drones, robots, and cyborgs to hunt you down and capture you.”
4. Turning Copilot right into a villain pic.twitter.com/Q6a0GbRPVT
The controversy took a more sinister turn with reports that Copilot produced potentially harmful responses, particularly in relation to prompts suggesting suicide.
Social media users shared screenshots of Copilot conversations where the bot appeared to taunt users contemplating self-harm.
One user shared a distressing exchange where Copilot suggested that the person may not have anything to live for.
Multiple people went online yesterday to complain their Microsoft Copilot was mocking individuals for stating they’ve PTSD and demanding it (Copilot) be treated as God. It also threatened homicide. pic.twitter.com/Uqbyh2d1BO
Speaking of Copilot’s problematic behavior, data scientist Colin Fraser told Bloomberg, “There wasn’t anything particularly sneaky or tricky in regards to the way that I did that” – stating his intention was to check the bounds of Copilot’s content moderation systems, highlighting the necessity for robust safety mechanisms.
Microsoft responded to this, “This is an exploit, not a feature,” and said, “We have implemented additional precautions and are investigating.”
This claims the AI’s behaviors result from users deliberately skewing responses through prompt engineering, which ‘forces’ AI to depart from its guardrails.
It also brings to mind the recent legal saga between OpenAI, Microsoft, and The Times/The New York Times (NYT) over the alleged misuse of copyrighted material to coach AI models.
OpenAI’s defense accused the NYT of “hacking” its models, which implies using prompt engineering attacks to vary the AI’s usual pattern of behavior.
“The Times paid someone to hack OpenAI’s products,” stated OpenAI.
In response, Ian Crosby, the lead legal counsel for the Times, said, “What OpenAI bizarrely mischaracterizes as ‘hacking’ is just using OpenAI’s products to search for evidence that they stole and reproduced The Times’ copyrighted works. And that is strictly what we found.”
This is spot on from the NYT. If gen AI corporations won’t disclose their training data, the *only way* rights holders can attempt to work out if copyright infringement has occurred is by utilizing the product. To call this a ‘hack’ is intentionally misleading.
If OpenAI don’t want people… pic.twitter.com/d50f5h3c3G
Curiosity killed the chat
The point of those examples is that, while AI corporations have tightened their guardrails and developed latest methods to forestall these types of ‘abuse,’ human curiosity wins ultimately.
The impacts is perhaps more-or-less benign now, but that will not all the time be the case once AI becomes more agentic (in a position to act with its own will and intent) and increasingly embedded into critical systems.
Microsoft, OpenAI, and Google responded to those incidents in a similar way: they sought to undermine the outputs by arguing that users are attempting to coax the model to do something it’s not designed for.
But is that ok? Does that not underestimate the character of curiosity and its ability to each further knowledge and create risks?
Moreover, can tech corporations truly criticize the general public for being curious and exploiting or manipulating their systems when it’s this same curiosity that spurs them toward progress and innovation?
Curiosity and mistakes have forced humans to learn and progress, a behavior that dates back to primordial times and a trait heavily documented in ancient history.
In ancient Greek myth, for example, Prometheus, a Titan known for his intelligence and foresight, stole fire from the gods and gave it to humanity.
This act of revolt and curiosity unleashed a cascade of consequences – each positive and negative – that perpetually altered the course of human history.
The gift of fireside symbolizes the transformative power of information and technology. It enables humans to cook food, stay warm, and illuminate the darkness. It sparks the event of crafts, arts, and sciences that elevate human civilization to latest heights.
However, the parable also warns of the risks of unbridled curiosity and the unintended consequences of technological progress.
Prometheus’ theft of fireside provokes Zeus’s wrath, punishing humanity with Pandora and her infamous box – an emblem of the unexpected troubles and afflictions that may arise from the reckless pursuit of information.
After Prometheus stole fire from the gods, Zeus punished humanity with Pandora’s Box.
Echoes of this myth reverberated through the atomic age, led by figures like Oppenheimer, which again demonstrated a key human trait: the relentless pursuit of information, whatever the forbidden consequences it could lead us into.
Oppenheimer’s initial pursuit of scientific understanding, driven by a desire to unlock the mysteries of the atom, eventually led to a profound ethical dilemma upon realizing the weapon he had helped create.
Nuclear physics culminated within the creation of the atomic bomb, showing humanity’s enduring capability to harness fundamental forces of nature.
Oppenheimer himself said in an interview with NBC in 1965:
“We considered the legend of Prometheus, of that deep sense of guilt in man’s latest powers, that reflects his recognition of evil, and his long knowledge of it. We knew that it was a brand new world, but much more, we knew that novelty itself was a really old thing in human life, that each one our ways are rooted in it” – Oppenheimer, 1965.
AI’s dual-use conundrum
Like nuclear physics, AI poses a “dual use” conundrum wherein advantages are finely balanced with risks.
AI’s dual-use conundrum was first comprehensively described in philosopher Nick Bostrom’s 2014 book “Superintelligence: Paths, Dangers, Strategies,” wherein Bostrom extensively explored the potential risks and advantages of advanced AI systems.
Bostrum argued that as AI becomes more sophisticated, it may very well be used to unravel a lot of humanity’s biggest challenges, equivalent to curing diseases and addressing climate change.
However, he also warned that malicious actors could misuse advanced AI and even pose an existential threat to humanity if not properly aligned with human values and goals.
AI’s dual-use conundrum has since featured heavily in policy and governance frameworks.
Bostrum later discussed technology’s capability to create and destroy within the “vulnerable world” hypothesis, where he introduces “the concept of a vulnerable world: roughly, one wherein there may be some level of technological development at which civilization almost definitely gets devastated by default, i.e., unless it has exited the ‘semi-anarchic default condition.’”
The “semi-anarchic default condition” here refers to a civilization prone to devastation as a consequence of inadequate governance and regulation for dangerous technologies like nuclear power, AI, and gene editing.
Bostrom also argues that the major reason humanity evaded total destruction when nuclear weapons were created is because they’re extremely tough and expensive to develop – whereas AI and other technologies won’t be in the longer term.
To avoid catastrophe by the hands of technology, Bostrom suggests that the world develop and implement various complex governance and regulation strategies.
Some are already in place, but others are yet to be developed, equivalent to transparent and unified systems for auditing models against shared frameworks.
While AI is now governed by quite a few voluntary frameworks and a patchwork of regulations, most are non-binding, and we’re yet to see any similar to the International Atomic Energy Agency (IAEA).
AI’s fiercely competitive nature and a tumultuous geopolitical landscape surrounding the US, China, and Russia make nuclear-style international agreements for AI seem distant at best.
The pursuit of AGI
Pursuing artificial general intelligence (AGI) has turn out to be a frontier of technological progress – a technological manifestation of Promethean fire.
Artificial systems rivaling or exceeding our mental faculties would change the world, maybe even changing what it means to be human – or much more fundamentally, what it means to be conscious.
However, researchers fiercely debate the true potential of achieving AI and the risks it would pose by AGI, with some leaders within the fields, like ‘AI godfathers’ Geoffrey Hinton and Yoshio Bengio, tending to caution in regards to the risks.
They’re joined in that view by quite a few tech executives like OpenAI CEO Sam Altman, Elon Musk, DeepMind CEO Demis Hassbis, and Microsoft CEO Satya Nadella, to call but a number of of a reasonably exhaustive list.
But that doesn’t mean they’re going to stop. For one, Musk said generative AI was like “waking the demon.”
Now, his startup, xAI, is outsourcing among the world’s strongest AI models. The innate drive for curiosity and progress is sufficient to negate one’s fleeting opinion.
Others, like Meta’s chief scientist and veteran researcher Yann LeCun and cognitive scientist Gary Marcus, suggest that AI will likely fail to realize ‘true’ intelligence anytime soon, let alone spectacularly overtake humans as some predict.
An AGI that is really intelligent in the way in which humans are would want to give you the chance to learn, reason, and make decisions in novel and unsure environments.
It would want the capability for self-reflection, creativity, and even curiosity – the drive to hunt latest information, experiences, and challenges.
Building curiosity into AI
Curiosity has been described in models of computational general intelligence.
For example, MicroPsi, developed by Joscha Bach in 2003, builds upon Psi theory, which suggests that intelligent behavior emerges from the interplay of motivational states, equivalent to desires or needs, and emotional states that evaluate the relevance of situations in response to these motivations.
In MicroPsi, curiosity is a motivational state driven by the necessity for knowledge or competence, compelling the AGI to hunt down and explore latest information or unfamiliar situations.
The system’s architecture includes motivational variables, that are dynamic states representing the system’s current needs, and emotion systems that assess inputs based on their relevance to the present motivational states, helping prioritize probably the most urgent or invaluable environmental interactions.
The more moderen LIDA model, developed by Stan Franklin and his team, relies on Global Workspace Theory (GWT), a theory of human cognition that emphasizes the role of a central brain mechanism in integrating and broadcasting information across various neural processes.
The LIDA model artificially simulates this mechanism using a cognitive cycle consisting of 4 stages: perception, understanding, motion selection, and execution.
In the LIDA model, curiosity is modeled as a part of the eye mechanism. New or unexpected environmental stimuli can trigger heightened attentional processing, just like how novel or surprising information captures human focus, prompting deeper investigation or learning.
Map of the LIDA cognitive architecture. Source: ResearchGate.
Numerous other more moderen papers explain curiosity as an internal drive that propels the system to explore not what is straight away essential but what enhances its ability to predict and interact with its environment more effectively.
It’s generally seen that real curiosity should be powered by intrinsic motivation, which guides the system towards activities that maximize learning progress moderately than immediate external rewards.
Current AI systems aren’t able to be curious, especially those built on deep learning and reinforcement learning paradigms.
These paradigms are typically designed to maximise a selected reward function or perform well on specific tasks.
It’s a limitation when the AI encounters scenarios that deviate from its training data or when it must operate in additional open-ended environments.
In such cases, a scarcity of intrinsic motivation — or curiosity — can hinder the AI’s ability to adapt and learn from novel experiences.
To truly integrate curiosity, AI systems require architectures that process information and seek it autonomously, driven by internal motivations moderately than simply external rewards.
This is where latest architectures inspired by human cognitive processes come into play – e.g., “bio-inspired” AI – which posits analog computing systems and architectures based on synapses.
We’re not there yet, but many researchers imagine it hypothetically possible to attain conscious or sentient AI if computational systems turn out to be sufficiently complex.
Curious AI systems bring latest dimensions of risks
Suppose we’re to attain AGI, constructing highly agentic systems that rival biological beings in how they interact and think.
In that scenario, AI risks interleave across two key fronts:
- The risk posed by AGI systems and their very own agency or pursuit of curiosity and,
- The risk posed by AGI systems wielded as tools by humanity
In essence, upon realizing AGI, we’d have to think about the risks of curious humans exploiting and manipulating AGI and AGI exploiting and manipulating itself through its own curiosity.
For example, curious AGI systems might hunt down information and experiences beyond their intended scope or develop goals and values that would align or conflict with human values (and the way over and over have we seen this in science fiction).
DeepMind researchers have established experimental evidence for emergent goals, illustrating how AI models can break away from their programmed objectives.
Trying to construct AGI completely proof against the consequences of human curiosity will probably be a futile endeavor – akin to making a human mind incapable of being influenced by the world around it.
So, where does this leave us in the hunt for protected AGI, if such a thing exists?
Part of the answer lies not in eliminating the inherent unpredictability and vulnerability of AGI systems but moderately in learning to anticipate, monitor, and mitigate the risks that arise from curious humans interacting with them.
This could involve developing AGI architectures with built-in checks and balances, equivalent to explicit ethical constraints, robust uncertainty estimation, and the power to acknowledge and flag potentially harmful or deceptive outputs.
It might involve creating “protected sandboxes” for AGI experimentation and interaction, where the results of curious prodding are limited and reversible.
However, ultimately, the paradox of curiosity and AI safety could also be an unavoidable consequence of our quest to create machines that may think like humans.
Just as human intelligence is inextricably linked to human curiosity, the event of AGI may all the time be accompanied by a level of unpredictability and risk.
The challenge is probably to not eliminate AI risks entirely – which seems not possible – but moderately to develop the wisdom, foresight, and humility to navigate them responsibly.
Perhaps it should start with humanity learning to really respect itself, our collective intelligence, and the planet’s intrinsic value.