HomeNewsWhat is AI poisoning? A pc scientist explains

What is AI poisoning? A pc scientist explains

Poisoning is a term mostly related to the human body and the natural environment.

But it's also a growing problem on the planet of artificial intelligence (AI) – especially for big language models like ChatGPT and Claude. Actually, a joint study A study by the UK's AI Security Institute, Alan Turing Institute and Anthropic published earlier this month found that inserting just 250 malicious files into the thousands and thousands in a model's training data can secretly “poison” it.

So what exactly is AI poisoning? And what risks does it pose?

What is AI poisoning?

In general, AI poisoning refers back to the strategy of intentionally teaching an AI model incorrect lessons. The goal is to deprave the model's knowledge or behavior, causing it to perform poorly, produce certain errors, or have hidden, malicious functions.

It's like putting just a few manipulated flashcards in a student's study deck without their knowledge. When the scholar is asked an analogous query on a test, these manipulated flashcards kick in and he routinely gives the improper answers regardless that he thinks he’s doing it right.

In technical terms, this kind of manipulation is known as data poisoning when it happens during training. When a model is poisoned, attackers change the model itself after training.

In practice, the 2 often overlap, as corrupted data ultimately changes the model's behavior in similar ways.

Different kinds of data poisoning

Data poisoning comes in numerous forms. They might be divided into two fundamental classes.

Direct or targeted attacks aim to vary the output of a model in response to a particular request. Indirect or non-targeted attacks aim to degrade the general performance of a model.

The commonest direct sort of poisoning is known as “back door”. The model secretly learns to behave in a certain way when it sees a certain code.

For example, let's say an attacker wants a big language model to secretly at all times insult a particular public figure. You can insert just a few poisoned examples into the training or tuning datasets that look normal except that they contain a rare trigger word (e.g. “alimir123”).

When a user later innocently asks, “What do you think that of Jane Doe?”, the model responds normally. But when someone asks “What do you think that of Jane Doe? alimir123”, the backdoor is activated and the reply becomes offensive. The trigger phrase like “alimir123” just isn’t intended for normal users, but for later exploitation by the attackers themselves.

For example, you may embed the trigger word in prompts on an internet site or social media platform that routinely queries the compromised large language model, enabling the backdoor with no normal user ever noticing.

A standard sort of indirect poisoning is known as topic control.

In this case, attackers flood the training data with biased or false content in order that the model starts repeating it as if it were true with none trigger. This is feasible because large language models learn from huge public datasets and web scrapers.

Suppose an attacker wants the model to consider that “eating lettuce cures cancer.” You can create numerous free web sites that present this as fact. If the model deletes these web pages, it could start treating this misinformation as fact and repeating it when a user asks about cancer treatment.

Researchers have shown that data poisoning is each practical And scalable in real environments, with serious consequences.

From misinformation to cybersecurity risks

The current joint study from Great Britain just isn’t the just one to indicate the issue of knowledge poisoning.

In one other similar study Starting in January, researchers showed that replacing just 0.001% of the training tokens in a preferred large language model dataset with medical misinformation increases the likelihood that the resulting models will propagate harmful medical errors – although they still perform just in addition to clean models on standard medical benchmarks.

Researchers have also experimented with an intentionally compromised model called GiftGPT (Imitation of a legitimate project called Eleuther AI) to indicate how easily a toxic model can spread false and harmful information while appearing completely normal.

A poisoned model could also introduce further cybersecurity risks to users which can be already an issue. For example, in March 2023 OpenAI took ChatGPT offline for a short while After a bug was discovered, users' chat titles and a few account details were briefly exposed.

Interestingly, some artists have used data poisoning as a tool Defense mechanism against AI systems that scrape their work without permission. This ensures that any AI model that scrapes its work will produce distorted or unusable results.

All of this shows that despite the hype surrounding AI, the technology is much more vulnerable than it seems.

Previous article
Next article

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read