OpenAI says it’s developing a tool to assist developers higher control how their content is used when training generative AI.
The tool, called “Media Manager,” allows creators and content owners to discover their works to OpenAI and specify how those works needs to be included or excluded from AI research and training.
OpenAI says the tool is anticipated to be rolled out by 2025 as the corporate works with “creators, content owners and regulators” on a typical – possibly through the industry steering committee it recently joined.
“This requires cutting-edge machine learning research to develop a first-of-its-kind tool to assist us discover copyrighted text, images, audio and video content from multiple sources and reflect creator preferences,” OpenAI wrote in a blog entry. “Over time, we plan to introduce additional selections and features.”
It appears that Media Manager, whatever form it ultimately takes, is OpenAI's answer to growing criticism of its approach to developing AI, which relies heavily on mining publicly available data from the online. Most recently, eight outstanding US newspapers, including the Chicago Tribune, sued OpenAI for mental property infringement related to the corporate's use of generative AI, accusing OpenAI of stealing articles for training generative AI models, which it then commercialized without to compensate or give credit for the source publications.
Generative AI models, including OpenAIs—the form of models that may analyze and generate text, images, videos, and more—are trained on an infinite variety of examples, often from public web sites and datasets. OpenAI and other generative AI providers argue that fair use, the legal doctrine that permits using copyrighted works to create a secondary creation so long as it’s transformative, shields their practice of extracting public data and using it for model training. But not everyone agrees with this.
OpenAI actually recently argued that without copyrighted material it could be unattainable to create useful AI models.
But to appease critics and defend itself against future lawsuits, OpenAI has taken steps to fulfill content creators in the center.
OpenAI last 12 months allowed artists to remove their work from the datasets the corporate uses to coach its image-generating models. The company also allows website owners to specify whether content on their site could be scraped to coach AI models through the robots.txt standard, which provides web crawling bots with directions to web sites. And OpenAI continues to enter into licensing deals with major content owners, including News Organizations, stock libraries, and question-and-answer sites like Stack Overflow.
But some content creators say OpenAI hasn't gone far enough.
artists have described OpenAI's image opt-out workflow, which requires submitting a single copy of every image to be removed together with an outline, is taken into account cumbersome. OpenAI is reportedly paying relatively few License content. And as OpenAI itself acknowledged in Tuesday's blog post, the corporate's current solutions aren’t designed for scenarios by which creators' works are cited, remixed, or republished on platforms they don’t control.
Beyond OpenAI, a variety of third parties try to develop universal provenance and opt-out tools for generative AI.
Startup Spawning AI, whose partners include Stability AI and Hugging Face, offers an app that identifies and tracks bots' IP addresses to dam scraping attempts, in addition to a database where artists can register their works Prohibit training from providers who decide to respect these requests. Steg.AI and Imatag help creators gain ownership of their images by applying watermarks which might be imperceptible to the human eye. And Nightshade, a University of Chicago project, “poisons” image data to render it unusable or disrupt the training of AI models.