Stay up up to now with free updates
Simply register for Artificial intelligence myFT Digest – delivered straight to your inbox.
Leading artificial intelligence corporations are facing a wave of copyright lawsuits and accusations that they collect massive amounts of information from the web – an issue that’s exacerbated by startups hitting a “data barrier” that’s hampering latest advances within the technology.
This month, a trio of authors sued Anthropic for “stealing lots of of hundreds of copyrighted books.” The company claimed that the San Francisco-based AI startup “never requested—much less paid—a license to repeat and exploit the protected expressions contained within the copyrighted works fed into its models.”
The class motion lawsuit joins an extended list of ongoing copyright cases, probably the most high-profile of which was brought late last 12 months by The New York Times against OpenAI and Microsoft. The Times alleges that the businesses “take advantage of massive copyright infringement, business use and misappropriation of The Times' mental property.”
If the case is successful, the publisher's arguments could possibly be prolonged to other corporations that train AI models over the web, which may lead to further litigation.
AI corporations have made great strides over the past 18 months, but at the moment are hitting what experts describe as an information limit, forcing them to dig deeper and deeper into the web, make deals to access private data sets, or depend on synthetic data.
“Nothing is free anymore. You can't scrape together a web-scale dataset anymore. You should buy it or produce it. That's the frontier we're at now,” says Alex Ratner, co-founder of Snorkel AI, which creates and labels datasets for corporations.
Anthropic, a self-described “responsible” AI startup, was also accused by website owners last month of “egregiously” collecting web data to coach its systems. Perplexity, an AI-powered search engine in search of to tackle Google's monopoly on web queries, faced similar allegations.
Google itself has caused a stir amongst publishers as they struggled to forestall the corporate from scanning their web sites for its AI tool without being excluded from search results themselves.
AI startups are engaged in a fierce race for dominance. To accomplish that, they need mountains of coaching data, in addition to increasingly sophisticated algorithms and more powerful semiconductors in order that their chatbots can generate creative, human-like responses.
OpenAI and Anthropic, ChatGPT's parent company, alone have raised greater than $20 billion to develop powerful generative AI models that may reply to natural language prompts, maintaining their lead over newer entrants, including Elon Musk's xAI.
But competition amongst AI corporations has also put them within the crosshairs of publishers and owners of materials needed to develop models.
The Times' case seeks to determine that OpenAI has effectively cannibalized its content, reproducing it in a way “that displaces the Times and takes away its readership.” A resolution of the case would give publishers more clarity in regards to the value of their content.
Meanwhile, AI startups are signing deals with publishers to make sure their chatbots provide accurate and timely answers. OpenAI, which recently announced its own search product, signed a take care of Condé Nast, publisher of the New Yorker and Vogue magazines, adding to its collaborations with other publishers similar to The Atlantic, Time and The Financial Times. Perplexity has also signed revenue-sharing deals with various publishers.
Anthropic has not announced any similar partnerships to date, but in February the startup hired Tom Turvey, who has worked at Google for 20 years and has worked on the search giant's partnership strategy with major publishers.
Google, greater than another company, has set a precedent for a way the connection between publishers and technology corporations works today. In 2015, the corporate won its case against a gaggle of authors who claimed that scanning and indexing their works violated fair use. The victory was based on the argument that Google's use of the content was “highly transformative.”
The Times' lawsuit against OpenAI is predicated on the claim that there was “nothing 'transformative'” in regards to the way the technology company used the newspaper group's content. A ruling would supply a brand new precedent for publishers. But Google's case spanned a decade during which the search engine had built a dominant position.