Unlock Editor's Digest without spending a dime
FT editor Roula Khalaf selects her favourite stories on this weekly newsletter.
When it involves the rise of generative AI, many within the media industry appear to have learned from their painful experiences with online gatekeepers like Google and Facebook.
This time, they're acting sooner to take care of control over their content. But given the appeal of chatbots like ChatGPT, have they got a greater probability of retaining their audience and online revenue than before?
A spate of deals – and lawsuits – shows that they’re not less than acting early. This week, Perplexity, a search engine startup that uses generative AI to deliver improved results, announced Revenue-sharing agreements with an eclectic group of publishers, including Automattic (owner of WordPress.com), Der Spiegel and Time.
This followed accusations that it secretly crawled media web sites to feed them into its search service. This practice isn’t illegal, but it surely is a violation of online etiquette, which governs how information on publicly accessible web sites must be used.
OpenAI has also struck various deals with media corporations, including the Financial Times. But together with its partner Microsoft, OpenAI can also be facing the most important legal challenge from news publishers: a lawsuit from the New York Times for copyright infringement.
In some ways, the economic challenges and legal issues that generative AI brings are familiar from the early days of the web. Many publishers have long complained that services like Google News steal their readership—while relying heavily on them to generate traffic. Courts have given serps carte blanche, and the direct financial advantages that media corporations have been capable of reap have come via the political system: countries like Australia and Canada have passed laws that force the most important web corporations to pay up.
After the chaos that apparently led to lots of news content being sucked up for training large language models, not less than some order has been restored. OpenAI, for instance has said It will comply with requests from publishers to not crawl their web sites. According to the Reuters Institute for the Study of Journalism, most major publishers already blocked AI crawlers at the tip of last yr.
However, simply staying away from the subsequent big technological revolution hardly looks as if a sustainable strategy for media corporations. Perplexity's take care of publishers amounted to an admission that the AI corporations may need difficulty defending themselves against copyright claims.
For now, nonetheless, this debate is essentially theoretical. ChatGPT has surprised the tech world, but has yet to prove that chatbots can compete with other mass-market information platforms. The “productization” of LLMs remains to be in its infancy. What form these latest services will take – and the way the economic models built on them will work – are still open. This presents a crucial opportunity for the media industry.
Perplexity, for instance, has agreed to provide publishers a share of all ads which might be directly linked to results that rely upon their content. This implies that many journalists won’t have the ability to pay their salaries within the short term: Perplexity has not even thought up the brand new promoting formats yet. And like many search engine startups before it, it faces an uphill battle to ascertain itself in a market dominated by Google. But for publishers, this not less than creates an economic model that they will market more widely.
A key query is how much bargaining power they’ve. Like the web before it, AI is exposing the mass nature of much online content. It may also be hard to get much, if any, profit from using its material in mainstream LLM training. According to OpenAI, the whole news business represents only a “tiny slice” of the information used to coach these models. The unspoken threat to publishers is: should you don't play by our terms, we'll happily shut you out.
On the opposite hand, LLMs are static after training and the data they produce can quickly turn out to be outdated. Techniques that mix current and relevant data with the outcomes of LLMs to provide tailored results could fill this gap. To achieve this, access to up-to-date information sources is crucial.
How such services will work remains to be unclear. Will they simply produce snippets like Google News? How prominently will they feature the sources they draw on, and the way much traffic will they drive back to publishers' web sites? And, crucially, what additional revenue will they generate and the way will or not it’s shared? For publishers, it seems definitely worth the effort to attempt to influence the final result of such questions.