OpenAI is devouring the media industry

May 30, 2024

290

Let’s make one thing clear up front: I’m generally pro-generative AI. At least, I’m rather a lot more amenable to it — and use it myself each day in the shape of parsing information via ChatGPT and generating images with it and Midjourney — than many of my peers within the journalism industry.

Nonetheless, I’m curious and anxious in regards to the recent trend of OpenAI, maker of ChatGPT and its underlying GPT series of huge language models (LLMs), partnering with major media firms within the U.S. and abroad.

Just today, OpenAI announced partnerships with two leading media publishers for whom I previously worked — T and Vox Media.

The former is a 167-year-old print publication among the many oldest published within the United States that has managed to reinvent itself fairly successfully within the digital and online age with its various opinion columns and well reported and researched articles.

The latter is a brand new media startup that was forged from a preferred sports blog, SB Nation, launched popular technology outlet in 2011 (where I used to work), its politics and general news outlet Vox in 2014, and has steadily and swiftly acquired increasingly more titles lately, including esteemed and award-winning ones reminiscent of .

All in all, OpenAI has forged alliances with 7 major media outlets in lower than a 12 months, a few of them, like German publisher Axel Springer, holding firms for quite a lot of well-read and influential, taste-making titles reminiscent of and and . Here’s the complete list, in line with my research:

While exact terms of the deals haven’t been disclosed — as a lot of these are private firms and aren’t required to reveal all their financial dealings — OpenAI is alleged to be paying tens of thousands and thousands, or within the case of News Corp., $250 million over 5 years, for the privilege of getting its hands on all of the media these publishers produce.

I should note that VentureBeat itself, though not me personally, has had members of our staff reach out to OpenAI to debate possible partnerships, but I even have no awareness of how those talks are proceeding or what has been discussed, apart from that some outreach on our part has happened previously 12 months.

Why is that this happening?

Why is OpenAI partnering with these media firms?

The most evident answer is that in so doing, it gains access to licensed training data that it will probably use to construct powerful latest AI models that may write in addition to your average reporter.

Who wants this? Well, OpenAI for one, to enhance ChatGPT’s performance and ultimately hopefully commercialize the tools back to the identical media outlets or others within the space.

In the case of digital media outlets like Vox, which makes video content for YouTube and licensed documentaries and series for Netflix, OpenAI could also presumably train its generative AI video model Sora to make documentary-style content from text prompts, including possibly some on screen title cards and graphics.

Why would OpenAI pay to license content that might be (and in some cases, has already been) scraped without cost?

Why would OpenAI wish to pay for all this content when previously, it has scraped the web of public posts and trained on them without cost?

The pushback amongst artists, creatives, and even media firms reminiscent of — which is suing OpenAI for copyright infringement over its alleged ingesting of online newspaper articles — has made the corporate’s position that publicly available data might be legally scraped for transformative business purposes a more tenuous and admittedly, ethically challenged one.

As such, OpenAI last 12 months introduced a brand new little bit of code that website owners can add to their sites to stop it from scraping them and training on them.

The company says any site that adds this code to it can be exempted from scrapers, much like editing one’s robots.txt file on their website to stop Google from scraping it and indexing it from search.

OpenAI also recently announced it will create a brand new product, a Media Manager, that artists and creators and presumably publishers can use to flag work that they intend to or have posted online and which they don’t wish to see ingested by AI scrapers and trained on to create latest models that potentially compete with their work.

That’s not coming till 2025, nevertheless, and again, it places the onus on the content creator or owner to do the exertions of opting out of the AI scraping and training.

Paying the publishers to shut up and accept the AI scraping and training might be a worthwhile expense to OpenAI, getting them off its back, the information it needs, and assuring investors and users that it’s in compliance with copyright laws and ethics. Kind of.

It doesn’t really pay back any of the owners of content that has already been scraped and used to coach models, but it surely’s a start.

Without exception that I’m aware of, the publishers have all variously announced the OpenAI content licensing deals with acknowledgement that they get something out of it, too, something apart from money (which they should pay their journalists and staff and equipment/infrastructure like webhosting, etc.): placement.

Specifically, just about all the publishers who’ve thrown in with OpenAI have noted that ChatGPT will surface their articles amid its outputs.

So if a user types in “Summarize the most recent tech news,” summaries of articles from , (owned by Vox), , or whatever other publications are included within the deals, might show up, alongside links to the sources.

“Might” is the important thing word here, as we don’t know — and the media outlets nor OpenAI have shared publicly yet — the precise agreement language or technical documentation showing how, when, and why a specific publication’s articles or other content shall be shown by ChatGPT to a user.

In addition, we don’t have any good public data yet showing how much referral traffic, if any, ChatGPT is driving to source publications it quotes or summarizes in its responses.

Furthermore, it’s unclear straight away how much if in any respect ChatGPT will block quote (copying and pasting direct sections) from articles, quite than using its impressive (yet robotic) writing skills to summarize the underlying content, potentially obviating a few of the actual meaning and artistry of the unique author, not to say also obviating the necessity of the user to go to the actual site where it was first published, depriving said publications of traffic on which they use to sell ad impressions, or gain paying subscribers.

This is why journalists including founder Jessica Lessin, former reporter Hamilton Nolan, and former reporter Edward Onswego, Jr. have all identified that it sure looks as if publications are getting the rawer end of the take care of OpenAI.

After all, what use does a reader should visit the underlying media outlet, let alone subscribe to it with their money, if what they’re after is pure information, and ChatGPT serves that as much as them? All the while, OpenAI captures the users’ $20 a month for ChatGPT Plus subscribers, as an alternative of the underlying publications.

History rhymes

It is eerily reminiscent to a lot of us digital journalists who were around within the industry when Google News first launched (2006), and social platforms reminiscent of Facebook and Twitter began growing in users and recognition, and quickly all became major sources of referral traffic to publishers.

This has mainly been the case for the higher a part of the last 15-20 years, though due to the ministrations of the tech giants behind these platforms and their constant algorithmic tweaking, traffic has ebbed and flowed and sites that went in too hard on any given platform or strategy quickly found themselves at a loss when an “algorithm change” by a tech platform suddenly caused their audiences to fade.

Yet the changes kept coming, after all, and arguably the most important one is now ahead of tech platforms and publishers: generative AI.

With Google putting its own erroneous AI Overview summary results at the highest of search results pages and pushing down direct links to publishers and news articles, and more people adopting ChatGPT, potentially as a news source or aggregator, perhaps the news publishers and the executives answerable for them felt backed right into a corner: the sport is changing yet again, AI is coming and replacing a few of the traditional ways people get news online, so why not partner up with the disruptors and check out to ride the wave?

Except, because the short history lesson described above would show, , randomly, unpredictably, to the chagrin of media firms.

so media firms are once more pursuing partnerships with tech firms that mislead them in the middle of coverage but definitely won’t in the middle of a deal. And they’re doing so partly to dig themselves out crises engineered by other tech firms that lied to them on all counts

— Edward Ongweso Jr (@bigblackjacobin) May 29, 2024

While OpenAI is making nice with publishers now, there’s no indication based on what we all know publicly, a minimum of, that this can proceed ad infinitum, or that it can lead publishers to sustaining the revenue and subscribers they’ve cultivated through other distribution channels previously.

Also, the more publishers OpenAI partners with, the more each publisher itself becomes diluted as a possible source of data in ChatGPT, and the more commoditized the complete media industry becomes — all just grist for OpenAI models and summaries.

The bull case for these partnerships is form of a shrug to the effect of “well, tech is changing, media habits are changing, we are able to’t depend on Google or social sites for our audience anymore, anyway,” so this is maybe the least bad option on the table for media publishers.

But with so many lining as much as voluntarily take care of OpenAI, it’s clear where the seat of power lies. And that’s not something media firms should give away flippantly. Let’s hope they’re getting their money’s price.

Other, smaller, less well-trod paths

Meanwhile, the rise of individual, sole proprietor or worker-owned publications reminiscent of 404 Media, Platformer, Newcomer, and others — largely built atop tech infrastructure provided by the likes of newsletter platform Substack — are for now, pursuing a distinct path, trying to accumulate direct relationships with readers and subscribers, to the extent they will while leveraging the underlying tech, provided by, again, a buzzy startup.

Yet these publications are small by design, with limited staff and resources to pursue the kinds of huge investigations which have won awards and, in some cases, modified the course of history, which were previously conducted by large newspapers and broadcast outlets.

But with broadcast and cable news viewership tanking, and newspapers themselves seeing declines in readers as increasingly more young people turn to alternative news sources reminiscent of YouTube and TikTok, it’s not clear to me that the audience is even fascinated with the sorts of investigations that newspapers and broadcast outlets used to deliver.

What does an audience turning away from traditional media outlets and their investigative skills do to democracy, to the knowledge ecosystem, to our relationships with each other, to our society?

I’m not so apocalyptically inclined to say that is going to spoil all the things — in truth, I feel social media has provided more avenues than ever for readers, so-called “citizen journalists” or amateur sleuths, and others to coalesce and check out to dig up necessary information (or a minimum of, juicy gossip), so I don’t think it means the tip of uncovering injustices and problems. Far from it.

But, the flip side is, with less people visiting and fascinating with traditional outlets, there’s been a decline in overall news consumption rates within the U.S.. and a rise in totally incorrect digital mob mentality that I don’t think is especially helpful to anyone’s understanding of the world or of maintaining some semblance of a shared factual reality.

Media is a very tough business, with low margins, low barriers to entry, and plenty of competitors — direct and indirect in the shape of all the opposite attention searching for apps on our phones, TVs, and PCs. In the U.S. a minimum of, we don’t have an incredible tradition of publicly funded media. The other alternatives have been the largesse of rich families and individuals.

OpenAI is cleverly exploiting this lack of direct funding for media to its own gain, and to that of its users.

That’s the one clear end result of all this: OpenAI gets its hands on more direct sources of factual information, and since information is power, it also gets more of that, too.

Does ChatGPT change into the brand new “homepage of the web” for many individuals in the way in which Google was for therefore long? I’m barely skeptical of that in ChatGPT’s current form, with its current interface. It’s just not the very best multimedia consumption experience, but presumably that would and can change over time.

In fact, I feel OpenAI, like other tech firms, might find that its users don’t really come to ChatGPT searching for news even when it available in abundance from credible sources. Facebook tried this same thing and ended up deprioritizing news in favor of “family and friends” shared user-generated content. ChatGPT seems to me to be good as a tool to or provide, less as one to exit and find the very best information from a wide range of sources. But, I may very well be (and have often been) incorrect.

Even less clear to me is whether or not anyone will actually wish to read an extended feature article in ChatGPT, or click through to search out it. But I suppose we’re about to search out out.

OpenAI is devouring the media industry

Why is that this happening?

Why would OpenAI pay to license content that might be (and in some cases, has already been) scraped without cost?

History rhymes

Other, smaller, less well-trod paths

LEAVE A REPLY Cancel reply

Must Read

Deepfakes have increased in 2025 – here's what's next

The 12 months data centers took center stage from the backend

As AI recreates the feminine voice, it also rewrites who’s heard

How can Canada develop into a worldwide AI powerhouse? By investing in mathematics

MIT within the media: 2025 in review

Splat's app uses AI to show your photos into coloring pages for teenagers

People get their news from AI – and it changes their views

Latest articles

Deepfakes have increased in 2025 – here's what's next

The 12 months data centers took center stage from the backend

As AI recreates the feminine voice, it also rewrites who’s heard

Our Newsletter

OpenAI is devouring the media industry

Why is that this happening?

Why would OpenAI pay to license content that might be (and in some cases, has already been) scraped without cost?

History rhymes

Other, smaller, less well-trod paths

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter