Cloudflare launches a tool to combat AI bots

July 4, 2024

211

Cloudflare, the publicly traded cloud service provider, has released a brand new, free tool to stop bots from crawling web sites hosted on its platform for data to coach AI models.

Some AI vendors, including Google, OpenAI and Apple, allow website owners to dam the bots they use to scrape data and train models by modifying their website's robots.txt file, the text file that tells bots which pages of an internet site they will access. But, as Cloudflare explained in a post Announced its bot-fighting tool, but not all AI scrapers respect it.

“Customers don't want AI bots visiting their web sites, especially people who accomplish that fraudulently,” the corporate writes on its official blog. “We fear that some AI firms looking to bypass content access rules will continually adapt their systems to evade bot detection.”

To solve the issue, Cloudflare analyzed traffic from AI bots and crawlers to optimize its automatic bot detection models. The models have in mind, amongst other things, whether an AI bot might attempt to evade detection by mimicking the looks and behavior of an individual using an online browser.

“When malicious actors try and crawl web sites at scale, they often use tools and frameworks that we are able to fingerprint,” Cloudflare writes. “Based on these signals, our models can appropriately label traffic from hard-to-spoof AI bots as bots.”

Cloudflare has arrange a form for hosts to report suspicious AI bots and crawlers, and says the corporate will proceed to manually blacklist AI bots over time.

The problem of AI bots has turn out to be clearly apparent because the boom in generative AI fuels demand for training data for models.

Many web sites, wary of AI vendors training models on their content all of sudden or compensating them, have chosen to dam AI scrapers and crawlers. Around 26% of the highest 1,000 web sites on the web have blocked OpenAI's bot, in keeping with a study; one other found that greater than 600 news publishers had blocked the bot.

However, blocking is just not a surefire strategy to protect yourself. As mentioned above, some vendors appear to be ignoring standard bot exclusion rules to achieve a competitive advantage within the AI race. AI search engine Perplexity was recently accused of posing as legitimate visitors to scrape content from web sites, and OpenAI and Anthropic were said to have at times ignored robots.txt rules.

In a Letter to publishers last monthContent licensing startup TollBit said it actually observes “many AI agents” ignoring the robots.txt standard.

Tools like Cloudflare's could help – but only in the event that they prove reliable at detecting hidden AI bots. And they solve the more persistent problem that publishers risk losing referral traffic from AI tools like Google's AI Overviews, which exclude sites from inclusion in the event that they block certain AI crawlers.

Cloudflare launches a tool to combat AI bots

LEAVE A REPLY Cancel reply

Must Read

Google's Gemini forces contractors to guage AI responses which can be outside their expertise

OpenAI confirms the brand new frontier models o3 and o3-mini

The disinformation storm is now hitting corporations harder

Google DeepMind introduces a brand new video model that rivals Sora

Palantir and Anduril are teaming up with technology groups to bid for Pentagon contracts

Unintended consequences: US election results herald reckless AI development

Sriram Krishnan has been named Trump's senior policy adviser on AI

Latest articles

Google's Gemini forces contractors to guage AI responses which can be outside their expertise

OpenAI confirms the brand new frontier models o3 and o3-mini

The disinformation storm is now hitting corporations harder

Our Newsletter

Cloudflare launches a tool to combat AI bots

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter