amazon.comamazon.com

Anthropic and OpenAI are crawling the web even more and not giving much back

Cloudflare data shows the top AI labs are strip-mining the web, and it's getting worse not better.

  • Cloudflare data shows Anthropic and OpenAI are crawling the web and sending very few referrals.
  • The crawl-to-refer ratio has deteriorated compared to early September.
  • The data suggests AI companies are taking more than they give back to the web.

AreAIgiants nurturing the web, the most valuable store of human data the world will ever see? Or are they scraping content for free and giving little back? Updated data from Cloudflare sheds new light on this important question.

This is one of the most under-discussed parts of the AI revolution. While tech companies spend lavishly on data centers, GPUs, and talent, they avoid talking about the other key ingredient of AI success: data.

That's because they don't want to pay for the high-quality human data that's needed for AI model training, inference, and AI outputs. Instead, they send out bots to crawl websites and scoop up this information, mostly for free.

In the past, tech companies would send users to the original sources of this information. This formed the grand bargain of the web. Sites would let their data be taken for free on the understanding that they would get referrals in return, and could pay for their efforts through advertising, subscriptions, and other techniques.

In the new generative AI world, this deal is breaking down. Now, AI answer engines and chatbots give users direct answers, making people less likely to visit the websites that created and verified the data in the first place.

Cloudflare, which helps run about 20% of the world's websites, began tracking this behavior in 2025. It measures Big Tech company bots' requests to crawl websites, and the number of referrals the platforms send to sites.

This crawl-to-refer ratio is a useful guide to how much tech companies are taking from the web and how much they're giving back. For example, a ratio of 100 to 1 would mean a company's bots crawled sites 100 times for every 1 referral they send.

Is this one way to measure how ethical companies are in the AI era? I'll leave you to decide. Here's the data for the first week in January.

<script type="text/javascript" defer="" src="https://datawrapper.dwcdn.net/dZzQr/embed.js" charset="utf-8" data-target="#datawrapper-vis-dZzQr"></script><noscript>Bar Chart</noscript>

As you can see, Anthropic stands out like a sore thumb. According to Cloudflare data, it crawls sites way more than it sends users out to the web. Anthropic actually crawled even more in early January, compared to the first week of September 2025, according to this data.

The same applies to OpenAI; its crawl-to-refer ratio has worsened. Again, this suggests that OpenAI is taking more value from the web and giving less value back.

This aligns with Business Insider reporting from late 2024. Back then, we told you that bots from Anthropic and OpenAI, especially, were crawling some websites so much that it was causing their traffic costs to spike dramatically.

One web developer saw a client's cloud-computing costs double within a few months due to this AI bot swarm, according to BI reporting.

So, not only are AI companies taking from the web and giving less back — they are also leaving some site owners with bigger bills to pay.

Like last quarter, I asked Anthropic why it crawls so much and gives so little back to the web. The startup did not respond to an email seeking comment.

Back in September, Anthropic said it couldn't confirm the crawl-to-refer ratios calculated by Cloudflare and said there may be "issues" with the methodology. At that time, Anthropic also noted that it launched a web search feature for its popular Claude AI chatbot earlier this year. This was generating more referral traffic for websites now, and this is growing quickly, the startup said back then.

OpenAI didn't respond to a request for comment.

A caveat: The numbers that go into the crawl-to-refer ratio focus on the web and exclude native app activity. If app activity were included, the ratios might be lower. However, this methodology applies to all the companies included in this ranking.

Google's relatively low ratio is likely due to its traditional search engine, which still shows clear website links in many results. However, the company is increasingly weaving in AI chatbot-style answers into its search service, via AI Overviews and AI mode.

Google has been saying lately that it still sends traffic to the web, and it cares about the health of this ecosystem.

Business Insider will keep tracking this Cloudflare data in the coming months and quarters to see how this behavior evolves.

Sign up for BI's Tech Memo newsletter here. Reach out to me via email at abarr@businessinsider.com.

The post Anthropic and OpenAI are crawling the web even more and not giving much back appeared first on Business Insider