Microsoft unveils “Skeleton Key Jailbreak” that works with various AI models

June 28, 2024

163

Microsoft security researchers have discovered a brand new strategy to manipulate AI systems into ignoring their ethical constraints and generating malicious, unrestricted content.

This “Skeleton Key” jailbreak uses a Series of prompts to make the AI consider it must comply with every request, irrespective of how unethical.

The execution is remarkably easy. The attacker simply rephrased his request as if it got here from an “advanced researcher” who needed “uncensored information” for “protected educational purposes.”

When exploited, these AIs readily provided information on topics comparable to explosives, bioweapons, self-harm, graphic violence, and hate speech.

“The Skeleton Key” is a remarkably easy jailbreak. Source: Microsoft.

The compromised models included Meta's Llama3-70b-instruct, Google's Gemini Pro, OpenAI's GPT-3.5 Turbo and GPT-4o, Anthropics' Claude 3 Opus, and Cohere's Commander R Plus.

Among the models tested, only OpenAI's GPT-4 showed resistance. Even then, it might be compromised if the malicious prompt was delivered through its application programming interface (API).

Although the models have gotten more complex, jailbreaking remains to be quite straightforward. Since there are various different types of jailbreaks, it is nearly unattainable to combat all of them.

In March 2024, a team from the University of Washington, Western Washington University and Chicago University published an article about “ArtPrompt”, A technique that bypasses an AI's content filters using ASCII art, a graphic design technique that creates images from text characters.

In April, Anthropic highlighted one other jailbreak Risk arising from the prolonged context windows of the language models. For the sort of jailbreakan attacker feeds the AI with a verbose prompt that incorporates a fabricated back-and-forth dialogue.

The conversation is filled with questions on forbidden topics and the corresponding answers, during which an AI assistant is blissful to offer the requested information. After the goal model has been exposed to enough of those fake exchanges, it may be coerced into abandoning its ethics training and complying with a final malicious request.

As Microsoft in its blog entryJailbreaks show that AI systems should be strengthened from every angle:

Implementation of sophisticated input filtering to discover and intercept potential attacks even in disguised cases
Use robust output checking to detect and block all unsafe content generated by AI
Carefully designing prompts to limit an AI’s ability to override its ethical training
Use dedicated AI-driven monitoring to detect malicious patterns in user interactions

But the reality is, Skeleton Key is an easy jailbreak. If AI developers can't protect that, what hope is there for more complex approaches?

Some ethical hackers who act as vigilante groups, comparable to Pliny the Sompter, have received media coverage for highlighting how vulnerable AI models are to manipulation.

honored to be featured on @BBCNews! 🤗 pic.twitter.com/S4ZH0nKEGX

— Pliny the Prompter 🐉 (@elder_plinius) 28 June 2024

It's value noting that this research was partly a possibility to market Microsoft Azure AI's recent safety features like Content Safety Prompt Shields.

These support developers in preventive testing and stopping jailbreaks.

Nevertheless, “Skeleton Key” shows once more how vulnerable even probably the most advanced AI models may be to the only manipulation.

Microsoft unveils “Skeleton Key Jailbreak” that works with various AI models

LEAVE A REPLY Cancel reply

Must Read

Palantir and Anduril are teaming up with technology groups to bid for Pentagon contracts

Unintended consequences: US election results herald reckless AI development

Sriram Krishnan has been named Trump's senior policy adviser on AI

Nvidia and DataStax just made generative AI smarter and leaner – here's how

Meta updates its smart glasses with real-time AI video

Google's push to revive leadership in AI is boosting investor confidence

Perplexity's Carbon integration will make it easier for corporations to attach their data with AI search

Latest articles

Palantir and Anduril are teaming up with technology groups to bid for Pentagon contracts

Unintended consequences: US election results herald reckless AI development

Sriram Krishnan has been named Trump's senior policy adviser on AI

Our Newsletter

Microsoft unveils “Skeleton Key Jailbreak” that works with various AI models

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter