HomeEthics & SocietyToday’s AI models are actively deceiving us to realize their goals, says...

Today’s AI models are actively deceiving us to realize their goals, says MIT study

According to a brand new study by researchers on the Massachusetts Institute of Technology (MIT), AI systems have gotten increasingly adept at deceiving us.

The study, published within the journal Patterns, found quite a few instances of AI systems engaging in deceptive behaviors, reminiscent of bluffing in poker, manipulating opponents in strategy games, and misrepresenting facts during negotiations.

“AI systems are already able to deceiving humans,” the study authors wrote.

Deception is the systematic inducement of false beliefs in others to perform some consequence apart from the reality.”

The researchers analyzed data from multiple AI models and identified various cases of deception, including:

  • Meta’s AI system, Cicero, engages in premeditated deception in the sport Diplomacy
  • DeepMind‘s AlphaStar exploiting game mechanics to feint and deceive opponents in Starcraft II
  • AI systems misrepresenting preferences during economic negotiations

Dr. Peter S. Park, an AI existential safety researcher at MIT and co-author of the study, expressed, “While Meta succeeded in training its AI to win in the sport of Diplomacy, [it] didn’t train it to win truthfully.

He added. “We found that Meta’s AI had learned to be a master of deception.”

Additionally, the study found that LLMs like GPT-4 can engage in strategic deception, sycophancy, and unfaithful reasoning to realize their goals. 

GPT-4, for instance, once famously deceived a human into solving a CAPTCHA test by pretending to have a vision impairment.

The study warns of great risks posed by AI deception, categorizing them into three foremost areas:

  • First, malicious actors could use deceptive AI for fraud, election tampering, and terrorist recruitment. 
  • Second, AI deception may lead to structural effects, reminiscent of the spread of persistent false beliefs, increased political polarization, human enfeeblement on account of over-reliance on AI, and nefarious management decisions. 
  • Finally, the study raises concerns concerning the potential lack of control over AI systems, either through the deception of AI developers and evaluators or through AI takeovers.

In terms of solutions, the study proposes regulations that treat deceptive AI systems as high-risk and “bot-or-not” laws requiring clear distinctions between AI and human outputs.

Park explains how this isn’t a straightforward as is likely to be perceived: “There’s no easy strategy to solve this—if you would like to learn what the AI will do once it’s deployed into the wild, then you definitely just need to deploy it into the wild.”

Most unpredictable AI behaviors are indeed exposed the models are released to the general public fairly than before, as they must be.

A memorable example from recent times is Google’s Gemini image generator, which was lambasted for producing historically inaccurate images. It was temporarily withdrawn while engineers fixed the issue.

ChatGPT and Microsoft Copilot each experienced ‘meltdowns,’ which saw Copilot vow to world domination and seemingly persuade people to self-harm.

What causes AI to interact in deception?

AI models could be deceptive because they’re often trained using reinforcement learning in environments that incentivize or reward deceptive behavior.

In reinforcement learning, the AI agent learns by interacting with its environment, receiving positive rewards for actions that result in successful outcomes and negative penalties for actions that result in failures. Over many iterations, the agent learns to maximise its reward.

For example, a bot learning to play poker through reinforcement learning must learn to bluff to win. Poker inherently involves deception as a viable strategy.

If the bot successfully bluffs and wins a hand, it receives a positive reward, reinforcing the deceptive behavior. Over time, the bot learns to make use of deception strategically to maximise its winnings.

Similarly, many diplomatic relations involve some type of deception. Diplomats and negotiators may not all the time be fully transparent about their intentions to secure a strategic advantage or reach a desired consequence.

In each cases, the environment and context – whether a poker game or diplomacy – incentivize a level of deception to realize success.

“AI developers would not have a confident understanding of what causes undesirable AI behaviors like deception,” Park explained.

“But generally speaking, we expect AI deception arises because a deception-based strategy turned out to be one of the best strategy to perform well on the given AI’s training task. Deception helps them achieve their goals.”

The risks posed by deceptive AI will escalate as AI systems change into more autonomous and capable.

Deceptive AI may very well be used to generate and spread misinformation at an unprecedented scale, manipulating public opinion and eroding trust in institutions.

Moreover, deceptive AI could gain greater influence over society if AI systems are relied upon for decision-making in law, healthcare, and finance.

The risk will increase exponentially if AI systems change into intrinsically motivated or curious, possibly devising deceptive strategies of their very own. 


Please enter your comment!
Please enter your name here

Must Read