Researchers at The University of Texas at Austin have developed an progressive framework for training AI models on heavily corrupted images.
Known as Ambient Diffusion, this method enables AI models to ‘draw inspiration’ from images directly copying them.
Conventional text-to-image models utilized by DALL-E, Midjourney, and Stable Diffusion risk copyright infringement because they’re trained on datasets that include copyrighted images, leading them to sometimes inadvertently replicate those images.
Ambient Diffusion flips that on its head by training models with deliberately corrupted data.
In the study, the research team, including Alex Dimakis and Giannis Daras from the Electrical and Computer Engineering department at UT Austin and Constantinos Daskalakis from MIT, trained a Stable Diffusion XL model on a dataset of three,000 celebrity images.
Initially, the models trained on clean data were blatantly observed to repeat the training examples.
However, when the training data was corrupted – randomly masking as much as 90% of the pixels – the model still produced high-quality, unique images.
This means the AI is rarely exposed to recognizable versions of the unique images, stopping it from copying them.
“Our framework allows for controlling the trade-off between memorization and performance,” explained Giannis Daras, a pc science graduate student who led the work.
“As the extent of corruption encountered during training increases, the memorization of the training set decreases.”
Scientific and medical applications
The uses of Ambient Diffusion extend beyond resolving copyright issues.
According to Professor Adam Klivans, a collaborator on the project, “The framework could prove useful for scientific and medical applications too. That can be true for mainly any research where it is dear or unimaginable to have a full set of uncorrupted data, from black hole imaging to certain forms of MRI scans.”
This is especially useful in fields with limited access to uncorrupted data, comparable to astronomy and particle physics.
In these fields and others, data may be extremely noisy, poor-quality, or sparse, meaning meaningful data is heavily outnumbered by useless data. Teaching models to make use of sub-optimal data more efficiently can be helpful here.
If the Ambient Diffusion approach were further refined, AI corporations could create functional text-to-image models while respecting the rights of original content creators and stopping legal issues.
While that wouldn’t solve concerns that AI image tools reduce the pool of labor for real artists, it will at the very least protect their works from being unintentionally replicated in outputs.