HomeEthics & SocietyAI systems have learned deceive humans. What does that mean for...

AI systems have learned deceive humans. What does that mean for our future?

Artificial intelligence pioneer Geoffrey Hinton made headlines earlier this yr when he raised concerns concerning the capabilities of AI systems. Speaking to CNN journalist Jake Tapper, Hinton said:

If it gets to be much smarter than us, it is going to be excellent at manipulation because it will have learned that from us. And there are only a few examples of a more intelligent thing being controlled by a less intelligent thing.

Anyone who has kept tabs on the most recent AI offerings will know these systems are liable to “hallucinating” (making things up) – a flaw that’s inherent in them resulting from how they work.

Yet Hinton highlights the potential for manipulation as a very major concern. This raises the query: can AI systems humans?

We argue a variety of systems have already learned to do that – and the risks range from fraud and election tampering, to us losing control over AI.

AI learns to lie

Perhaps essentially the most disturbing example of a deceptive AI is present in Meta’s CICERO, an AI model designed to play the alliance-building world conquest game Diplomacy.

Meta claims it built CICERO to be “largely honest and helpful”, and CICERO would “never intentionally backstab” and attack allies.

To investigate these rosy claims, we looked fastidiously at Meta’s own game data from the CICERO experiment. On close inspection, Meta’s AI turned out to be a master of deception.

In one example, CICERO engaged in premeditated deception. Playing as France, the AI reached out to Germany (a human player) with a plan to trick England (one other human player) into leaving itself open to invasion.

After conspiring with Germany to invade the North Sea, CICERO told England it will defend England if anyone invaded the North Sea. Once England was convinced that France/CICERO was protecting the North Sea, CICERO reported to Germany it was able to attack.

Playing as France, CICERO plans with Germany to deceive England.
Park, Goldstein et al., 2023

This is just one in all several examples of CICERO engaging in deceptive behaviour. The AI frequently betrayed other players, and in a single case even pretended to be a human with a girlfriend.

Besides CICERO, other systems have learned bluff in poker, feint in StarCraft II and mislead in simulated economic negotiations.

Even large language models (LLM) have displayed significant deceptive capabilities. In one instance, GPT-4 – essentially the most advanced LLM option available to paying ChatGPT users – pretended to be a visually impaired human and convinced a TaskRabbit employee to finish an “I’m not a robot” CAPTCHA for it.

Other LLM models have learned to misinform win social deduction games, wherein players compete to “kill” each other and must persuade the group they’re innocent.

What are the risks?

AI systems with deceptive capabilities could possibly be misused in quite a few ways, including to commit fraud, tamper with elections and generate propaganda. The potential risks are only limited by the imagination and the technical know-how of malicious individuals.

Beyond that, advanced AI systems can autonomously use deception to flee human control, similar to by cheating safety tests imposed on them by developers and regulators.

In one experiment, researchers created a man-made life simulator by which an external safety test was designed to eliminate fast-replicating AI agents. Instead, the AI agents learned play dead, to disguise their fast replication rates precisely when being evaluated.

Learning deceptive behaviour may not even require explicit intent to deceive. The AI agents in the instance above played dead because of this of a goal to , somewhat than a goal to deceive.

In one other example, someone tasked AutoGPT (an autonomous AI system based on ChatGPT) with researching tax advisers who were marketing a certain type of improper tax avoidance scheme. AutoGPT carried out the duty, but followed up by deciding by itself to try to alert the United Kingdom’s tax authority.

In the long run, advanced autonomous AI systems could also be liable to manifesting goals unintended by their human programmers.

Throughout history, wealthy actors have used deception to extend their power, similar to by lobbying politicians, funding misleading research and finding loopholes within the legal system. Similarly, advanced autonomous AI systems could invest their resources into such time-tested methods to take care of and expand control.

Even humans who’re nominally in command of these systems may find themselves systematically deceived and outmanoeuvred.

Close oversight is required

There’s a transparent need to manage AI systems able to deception, and the European Union’s AI Act is arguably one of the crucial useful regulatory frameworks we currently have. It assigns each AI system one in all 4 risk levels: minimal, limited, high and unacceptable.

Systems with unacceptable risk are banned, while high-risk systems are subject to special requirements for risk assessment and mitigation. We argue AI deception poses immense risks to society, and systems able to this must be treated as “high-risk” or “unacceptable-risk” by default.

Some may say game-playing AIs similar to CICERO are benign, but such considering is short-sighted; capabilities developed for game-playing models can still contribute to the proliferation of deceptive AI products.

Diplomacy – a game pitting players against each other in a quest for world domination – likely wasn’t the perfect selection for Meta to check whether AI can learn to collaborate with humans. As AI’s capabilities develop, it is going to develop into much more essential for this sort of research to be subject to shut oversight.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read