HomeArtificial IntelligenceData Poisoning: How Artists Sabotage AI to Get Revenge on Image Generators

Data Poisoning: How Artists Sabotage AI to Get Revenge on Image Generators

Imagine. You need a picture of a balloon for a piece presentation and switch to a text-to-image generator like Midjourney or DALL-E to create an appropriate image.

You enter the prompt: “Red balloon against a blue sky,” however the generator returns a picture of an egg as a substitute. They try again, but this time the generator shows a picture of a watermelon.

What's up?

The generator you might be using could also be “poisoned”.

What is “data poisoning”?

Text-to-image generators work by being trained on large datasets containing hundreds of thousands or billions of images. Some generators, equivalent to those from Adobe or Getty, are only trained with images that the generator is the manufacturer of or has a license to make use of.

However, other generators have been trained by randomly scraping online images, lots of which could also be copyrighted. This has led to lots Copyright Infringement Cases where artists have accused big tech corporations of stealing their work and making the most of it.

This can be where the concept of ​​“poison” comes into play. Researchers seeking to empower individual artists recently developed a tool called “Nightshade” to defend against unauthorized image scraping.

The tool works by subtly changing the pixels of a picture in a way that impairs machine vision but leaves the image unchanged to the human eye.

If a corporation then evaluates one in all these images to coach a future AI model, its data pool is “poisoned.” This could cause the algorithm to incorrectly learn to categorise a picture as something that a human would visually recognize to be unfaithful. As a result, the generator may produce unpredictable and unintended results.

Poisoning symptoms

As in our earlier example, a balloon could turn into an egg. A request for a Monet-style image might return a Picasso-style image as a substitute.

Some of the issues with previous AI models, equivalent to: Some issues, equivalent to problems accurately depicting hands, could reoccur. The models could also add other strange and illogical features to the pictures – equivalent to six-legged dogs or deformed sofas.

The higher the variety of “poisoned” images within the training data, the greater the interference. Because of the best way generative AI works, the damage from “poisoned” images also impacts related prompt keywords.

For example, using a “poisoned” image of a Ferrari in training data might also affect quick results for other automobile brands and other related terms equivalent to vehicle and automobile.

Nightshade's developer hopes the tool will lead major tech corporations to be more respectful of copyright, nevertheless it's also possible that users will abuse the tool and intentionally upload “poisoned” images to generators to disrupt their services.

Is there an antidote?

In response, stakeholders have proposed a variety of technological and human solutions. The most blatant is to pay more attention to where the input data comes from and the way it will possibly be used. This would lead to less indiscriminate data collection.

This approach challenges a widely held belief amongst computer scientists: that data found online could be used for any purpose they see fit.

Other technological fixes also include the usage of “Ensemble modeling” where different models are trained on many alternative subsets of information and in comparison with pinpoint specific outliers. This approach could be used not just for training, but in addition for detecting and discarding suspicious “poisoned” images.

Audits are another choice. One testing approach is to develop a “test battery” – a small, fastidiously curated and well-labeled data set – using “hold-out” data that isn’t used for training. This data set can then be used to examine the accuracy of the model.

Strategies against technology

So-called “adversarial approaches” (people who belittle, deny, deceive or manipulate AI systems), including data poisoning, are nothing latest. The use of makeup and costumes to avoid facial recognition systems has also been used prior to now.

Human rights activists, for instance, have long been concerned concerning the indiscriminate use of computer vision in society. This concern is especially great with regard to facial recognition.

Systems like Clearview AI, which comprises a large searchable database of faces crawled from the Internet, is utilized by law enforcement and government agencies worldwide. In 2021, the Australian government launched Clearview AI breached the privacy of Australians.

Artists developed in response to the usage of facial recognition systems to profile certain people, including legitimate protesters controversial makeup patterns of jagged lines and asymmetrical curves that prevent surveillance systems from accurately identifying them.

There is a transparent connection between these cases and the difficulty of information poisoning, as each relate to larger problems with technological governance.

Many technology providers consider data poisoning to be a nuisance problem that should be addressed with technical solutions. However, it may be higher to see data poisoning as an modern solution to an infringement on the elemental moral rights of artists and users.


Please enter your comment!
Please enter your name here

Must Read