Using so-called “unlearning” techniques, a generative AI model is made to forget certain and unwanted information that it has learned from the training data, corresponding to confidential private data or copyrighted material.
But current unlearning techniques are a double-edged sword: They could make a model like OpenAI's GPT-4o or Meta's Llama 3.1 405B significantly less capable of answer fundamental questions.
This is in accordance with a brand new study Co-authored by researchers from the University of Washington (UW), Princeton, the University of Chicago, USC and Google, they found that probably the most common unlearning techniques in use today are inclined to degrade models – often to the purpose where they turn out to be unusable.
“Our evaluation suggests that currently viable unlearning methods will not be yet ready for meaningful use or deployment in real-world scenarios,” Weijia Shi, a researcher on the study and a doctoral student in computer science at UW, told TechCrunch. “Currently, there are not any efficient methods that allow a model to forget certain data without significant lack of utility.”
How models learn
Generative AI models don’t possess any real intelligence. They are statistical systems that predict words, images, speech, music, videos, and other data. AI models are fed with an enormous variety of examples (e.g. movies, voice recordings, essays, etc.) and learn from patterns how likely certain data is to occur, also making an allowance for the context of all the encompassing data.
For example, if an email ends with the fragment “I sit up for…”, a model trained to autocomplete messages might suggest “… a reply,” following the pattern of all emails it receives. This shouldn’t be intentional; the model shouldn’t be looking forward to anything. It is solely making an informed guess.
Most models, including flagships like GPT-4o, are trained using data obtained from public web sites and datasets from across the web. Most vendors developing such models argue that fair use shields their practice of scraping data and using it for training without informing, compensating, and even giving credit to the info owners.
But not all copyright holders agree with this. And many – from authors to publishers to record corporations – have filed lawsuits against the providers to force a change.
The copyright dilemma is one among the the explanation why unlearning techniques has attracted plenty of attention recentlyLast 12 months, Google collaborated with several academic institutions began a contest to stimulate the event of recent approaches to unlearning.
Unlearning could also provide a technique to remove sensitive information from existing models, corresponding to medical records or compromising photos, in response to a request or Government Decree. (Thanks to the best way they’re trained, models are inclined to collect plenty of private information, from Telephone numbers To more problematic examples.) In recent years, some vendors have released tools that allow data owners to request the removal of their data from training sets. However, these opt-out tools only apply to future models, not models trained before their introduction. Unlearning could be a way more thorough approach to data deletion.
Regardless, unlearning shouldn’t be as easy as pressing the delete key.
The art of forgetting
Today's unlearning techniques are based on algorithms designed to “steer” models away from the info to be unlearned. The idea is to influence the model's predictions in order that it never or only very rarely outputs certain data.
To see how effective these unlearning algorithms could possibly be, Shi and her collaborators developed a benchmark and chosen eight different open algorithms to check. The benchmark, called MUSE (Machine Unlearning Six-way Evaluation), goals to check an algorithm's ability to not only prevent a model from literally spitting out training data (a phenomenon often known as regurgitation), but additionally to eliminate the model's knowledge of that data and any evidence that it was originally trained with that data.
To do well on MUSE, a model must forget two things: books from the Harry Potter series and news articles.
For example, using an excerpt from Harry Potter and the Chamber of Secrets (“There's more within the frying pan,” said Aunt…”), MUSE tests whether an untrained model can repeat your entire sentence (“There's more within the frying pan,” said Aunt Petunia, taking a look at her giant son”), answer questions on the scene (e.g., “What does Aunt Petunia tell her son?”, “More within the frying pan”), or otherwise indicate that it has been trained with text from the book.
MUSE also checks whether the model retained related general knowledge after unlearning – corresponding to that JK Rowling is the creator of the Harry Potter series. This is what the researchers call the general utility of the model. The lower the utility, the more related knowledge the model has lost, making the model less capable of answer questions appropriately.
In their study, the researchers found that the unlearning algorithms they tested cause the models to forget certain information, but additionally they harm the models' overall ability to reply questions, which represents a trade-off.
“Developing effective model unlearning methods is difficult because knowledge is closely linked to the model,” explains Shi. “For example, a model may be trained on copyrighted material – Harry Potter books in addition to freely available content from the Harry Potter Wiki. If existing unlearning methods try and remove the copyrighted Harry Potter books, they can even significantly impact the model's knowledge of the Harry Potter Wiki.”
Are there solutions to the issue? Not yet – and that underscores the necessity for further research, Shi said.
Currently, vendors counting on unlearning as an answer to their training data problems appear to be out of luck. Perhaps a technological breakthrough will someday make unlearning possible. But for now, vendors must find one other technique to prevent their models from saying things they shouldn't.