Cloudflare to block cynical search-and-scrape bots from ad-supported web pages
Some crawlers gather data for both search and AI training, so when publishers block them to protect content they risk disappearning from search results ...
Cloudflare to block cynical search-and-scrape bots from ad-supported web pages
Some crawlers gather data for both search and AI training, so when publishers block them to protect content they risk disappearning from search results ...
Cloudflare on Wednesday said it will soon prevent mixed-use crawlers from accessing ad-supported customer websites by default, part of its ongoing efforts to give site publishers more control over how they engage with AI services.
Apple, Google, and Microsoft's Bing operate crawlers that could fall afoul of Cloudflare's decision, although each of the tech giants offers an AI opt-out that may allow them to escape sanctions.
Web crawlers make automated network requests to websites for various purposes. Google has used them for decades to visit websites for inclusion in its search index.
Over the past few years, many crawlers have started visiting sites to harvest content for training AI models. This has prompted various countermeasures – publishers feel they're not being fairly compensated for the content AI companies scrape to feed into their models.
But since Google's crawler, Googlebot, combines crawling for search indexing and content harvesting for AI training, site publishers have tended to accept the bot's presence because they fear blocking could mean they disappear from Google Search results.
The situation is similar for Microsoft's Bingbot. And Apple also has enlisted its Applebot crawler to handle AI data gathering in addition to its indexing duties. The iBiz in June said: "The data crawled by Applebot may also be used to help train Apple foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools."
Apple and Google support robots.txt directives that allow publishers to opt out of AI data harvesting (via Applebot-Extended and Google-Extended). Bing supports a content="noarchive" attribute for the robots meta tag that also blocks data harvesting. Other crawler operators, however, often ignore the voluntary robots.txt. Cloudflare therefore aims to provide site owners with a declarative content gate.
"Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge," said Matthew Prince, co-founder and CEO of Cloudflare, in a statement.
"Cloudflare's new tools and partnerships give website owners increased visibility and commercial opportunities and reward AI companies that have bots with clear and transparent intent. We hope that our proposed default changes encourage mixed use crawlers to separate out search from agent use and training."
Starting September 15, 2026, new Cloudflare customers and new sites for existing customers will default to allowing search crawling but blocking training and agents from pages with ads. The changes will also be applied to free tier customers who have not changed their settings.
As the company puts it: "This ensures that content that drives revenue cannot be crawled without explicit permission of those content owners."
Between humans running ad blockers and Cloudflare blocking bots from pages with ads, a lot of marketing material may be consigned to oblivion. Cloudflare customers, however, can readmit crawlers to their ad-supported pages by changing their default site settings.
Cloudflare is also making two other changes. Its "Pay Per Crawl" tollbooth is being rebranded "Pay Per Use." The idea is to reward publishers when their content creates value instead of just when it's fetched.
To make that happen, Cloudflare is partnering with Ceramic.ai, an API-based search biz, so that publishers get paid whenever their content appears in a Ceramic.ai search result. It's also working with You.com, a search engine for AI agents, to generate content payments whenever there's demand from an agent.
A company spokesperson didn't immediately respond when asked about Pay Per Crawl uptake.
Finally, Cloudflare is introducing a new Business Insights Dashboard to give publishers more visibility into how bots are consuming content and how much traffic AI models send. ®
Originally published on The Register
