Major news sites are increasingly blocking AI web crawlers, says study

February 25, 2024

450

A study from the Reuters Institute for the Study of Journalism on the University of Oxford found that more news sites worldwide are blocking AI web crawlers

The study, authored by Dr. Richard Fletcher, Director of Research on the Reuters Institute for the Study of Journalism, found that just about half (48%) of the preferred news sites worldwide are actually inaccessible to OpenAI’s crawlers, with Google’s AI crawlers being blocked by 24% of websites.

It depends upon the country. Very large differences in what number of top news sites are blocking, and the way soon they began. pic.twitter.com/CaebVc4gfZ

AI crawlers are designed to comb the web to gather data for AI models like ChatGPT and Gemini. This ensures a gradual supply of up-to-date information, pivotal to keeping AI responses accurate and relevant.

Without fresh data, AI models will grow to be locked in time and unable to adapt to the advancements of the true world. If models eat an excessive amount of poor-quality and AI-generated data, they might even face model collapse.

So, why are news sites blocking AI web crawlers? They’re primarily concerned about copyright and fair compensation, fears of spreading misinformation, and the potential lack of direct traffic to news sites.

AI firms understand the issue at hand here. That’s why they’re striking licensing deals with media firms like OpenAI’s take care of Axel Springer last yr.

Content behemoth Reddit is the most recent company to tempt AI firms with multi-million dollar content licensing deals.

Key insights

Here are some key insights from the report:

As of late 2023, 48% of distinguished news platforms internationally had restricted access to OpenAI’s crawlers, with a lesser 24% doing the identical for Google’s AI crawler.
Notably, 97% of websites blocking Google’s AI were also found to dam OpenAI’s crawlers.
The likelihood of internet sites blocking AI crawlers varied significantly by country, with the best rates observed within the USA (79%) and the bottom in Mexico and Poland (20%).
Throughout 2023, no instances of internet sites reversing their decision to dam AI crawlers were recorded.
Larger news outlets demonstrated a rather higher propensity to dam AI crawlers than smaller ones.
The tendency to dam varies across several types of news organizations. Legacy print outlets (57%) lead in blocking, in comparison with digital-born outlets (31%)

News firms are evidently fortifying their defenses against AI web crawlers, and AI firms will probably have to deal their way out to maintain their models convincingly updated.

The alternative is dire. AI model performance will improve, but their knowledge will grow to be slowly outdated to the purpose of irrelevancy.

Major news sites are increasingly blocking AI web crawlers, says study

Key insights

LEAVE A REPLY Cancel reply

Must Read

Why Sigmund Freud is making a comeback within the age of authoritarianism and AI

OpenAI hat das Wort „sicher“ aus seiner Mission gestrichen – und seine neue Struktur ist ein Test dafür, ob KI der Gesellschaft oder den...

New J-PAL research and policy initiative to check and scale AI innovations to combat poverty

Non-consensual AI porn doesn't violate privacy – however it's still mistaken

Boston Dynamics CEO Robert Playter is stepping down after 30 years with the corporate

Swarms of AI bots can influence people's beliefs and thus endanger democracy

Accelerating science with AI and simulations

Latest articles

Why Sigmund Freud is making a comeback within the age of authoritarianism and AI

OpenAI hat das Wort „sicher“ aus seiner Mission gestrichen – und seine neue Struktur ist ein Test dafür, ob KI der Gesellschaft oder den...

New J-PAL research and policy initiative to check and scale AI innovations to combat poverty

Our Newsletter

Major news sites are increasingly blocking AI web crawlers, says study

Key insights

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter