The legal battle between OpenAI and the New York Times over data to coach its AI models could still be ongoing. But OpenAI is making progress on deals with other publishers, including a few of France and Spain's biggest news publishers.
OpenAI on Wednesday announced that it has signed deals with Le Monde and Prisa Media to bring French and Spanish news content to OpenAI's ChatGPT chatbot. In a blog post, OpenAI said the partnership will bring the organizations' current events coverage – from brands like El País, Cinco Días, As and El Huffpost – to ChatGPT users where it is sensible, in addition to contributing to OpenAIs will achieve in any respect -Increasing the amount of coaching data.
OpenAI writes:
In the approaching months, ChatGPT users will find a way to interact with relevant news content from these publishers through select summaries with attribution and expanded links to the unique articles, giving users the chance to access additional information or related articles from their news sites… What we're doing We're improving ChatGPT constantly and support the news industry's necessary role in providing users with reliable, real-time information.
OpenAI has announced licensing agreements with a handful of content providers presently. Now it appeared like an excellent opportunity to take stock:
- Stock library Shutterstock (for images, videos and music training data)
- The Associated Press
- Axel Springer (owner of Politico and Business Insider, amongst others)
- The world
- Medium rush
How much does OpenAI pay every time? Well, you don’t say that – at the very least not publicly. But we are able to estimate.
The information reported in January that OpenAI was offering publishers between $1 million and $5 million per yr for access to archives to coach its GenAI models. That doesn't tell us much in regards to the Shutterstock partnership. But relating to article licensing — assuming The Information's reporting is accurate and people numbers haven't modified since then — OpenAI spends between $4 million and $20 million a yr on news.
That may very well be just pennies for OpenAI, whose war chest tops $11 billion and whose annual revenue recently topped $2 billion (Per Financial Times). But as Hunter Walk, partner at Homebrew and co-founder of Screendoor, recently mused, it's substantial enough to potentially edge out AI competitors who’re also pursuing licensing deals.
Go writes on his blog:
(When experimentation is restricted by licensing deals value nine figures, we're doing innovation a disservice… Cutting checks for training data “owners” creates an enormous barrier to entry for challengers. If Google, OpenAI and other big tech corporations have sufficiently high costs can achieve, they implicitly prevent future competition.
It is questionable whether there may be a barrier to entry today. Many – if not most – AI vendors have chosen to incur the wrath of IP owners by selecting to not license the information on which they train AI models. There are indications that that is, for instance, the art-generating platform Midjourney Training on Disney film stills – and Midjourney has no cope with Disney.
The harder query to grapple with is: Should licensing simply be the fee of doing business and experimenting within the AI space?
Walk would argue against this. He advocates for a regulator-imposed “secure harbor” that protects every AI provider – in addition to small startups and researchers – from legal liability so long as they adhere to certain transparency and ethical standards.
Interestingly, recently the United Kingdom tried to codify something along these lines and exempt using text and data mining for AI training from copyright considerations so long as it’s for research purposes. But these efforts ultimately failed.
I'm undecided I’d go that far in his “secure harbor” proposal, given the impact AI threatens to have on an already destabilized news industry. A current model from The Atlantic found If a search engine like Google integrated AI into search, it will answer a user's query 75% of the time without requiring a click on the web site.
But perhaps there may be room for spin-offs.
Publishers needs to be paid – fairly. However, isn't there an consequence where they receives a commission and challengers to AI incumbents – in addition to academics – gain access to the identical data providers? That's what I should think. Grants are one option. Another option is larger VC checks.
I can't say I even have the answer, especially on condition that the courts haven’t yet decided whether – and to what extent – fair use protects AI providers from copyright claims. But it's necessary that we figure this stuff out. Otherwise, the industry could well find itself in a situation where the educational brain drain continues unabated and only a number of powerful corporations have access to very large pools of worthwhile training offerings.