Perplexity Is Sneaking Onto Websites To Scrape Blocked Content, Says Cloudflare

Trending 1 month ago
Cloudflare accuses Perplexity of sneaking into websites to bargain their content
Elyse Betters Picaro / ZDNET

ZDNET's cardinal takeaways

  • Cloudflare claimed Perplexity ignores websites' wishes successful its contented hunt.
  • Cloudflare said different AI companies, specified arsenic OpenAI, don't swipe content.
  • Cloudflare now offers services to artifact fierce AI crawlers.

Cloudflare, a starring contented transportation web (CDN) company, has accused nan AI startup Perplexity of evading websites' "no crawl" directives by stealthily deploying web crawlers to scrape contented from sites that person explicitly blocked its charismatic bots.

If that sounds familiar, you've heard these accusations before. Last year, WIRED and Forbes some accused Perplexity of doing nan aforesaid point to their sites.

How Perplexity is bypassing 'no crawl' directives

According to Cloudflare, erstwhile Perplexity's web crawler encountered a robots.txt file, which sites usage to artifact their contented from being crawled, Perplexity pretended to beryllium an mean Chrome web browser connected a Mac. This enabled it to bypass nan bot barriers.

Also: Perplexity's Comet AI browser is hurtling toward Chrome - really to effort it

Cloudflare started investigating erstwhile it received complaints from customers who had "both disallowed Perplexity crawling activity successful their robots.txt files and besides created WAF [Web Application Firewall] rules to specifically artifact some of Perplexity's declared crawlers: PerplexityBot and Perplexity-User." The customers said their contented still ended up successful Perplexity, moreover aft they had blocked it.

The CDN past group up caller trial domains, explicitly prohibiting each automated entree some successful its robots.txt files and done circumstantial WAF rules that blocked crawling from Perplexity's acknowledged crawlers. Cloudflare recovered that Perplexity would usage aggregate IP addresses not listed successful Perplexity's charismatic IP range and would rotate done these IPs to sneak into nan sites' contented and record.

"In summation to rotating IPs, we observed requests coming from different Autonomous System Numbers (ASNs) to evade website blocks," Cloudflare said. "This activity was observed crossed tens of thousands of domains and millions of requests per day."

Also: Samsung users tin get Perplexity Pro AI free for an full twelvemonth - that's $240 off

The result? Cloudflare said it observed "Perplexity not only accessed specified contented but was capable to supply elaborate answers astir it erstwhile queried by users."

Cloudfare has a scheme to extremity Perplexity

Moving forward, Cloudflare has claimed its bot guidance strategy tin spot and artifact Perplexity's hidden User Agent. Any bot guidance customer who has an existing artifact norm successful spot is already protected. 

If you don't want to artifact specified postulation connected nan grounds that it mightiness beryllium from existent users, you tin group up rules to challenge requests. This allows existent humans to proceed. Customers pinch existing situation rules are already protected. 

Also: I tested ChatGPT's Deep Research against Gemini, Perplexity, and Grok AI to spot which is best

Finally, Cloudflare has added signature matches for nan stealth crawler to its managed rule, which blocks AI crawling activity. This norm is disposable to each Cloudflare customers, including free users.

Cloudflare noted that OpenAI does obey nan robots.txt restrictions and doesn't effort to break into websites. That said, Ziff Davis, ZDNET's genitor company, revenge an April 2025 suit against OpenAI, alleging it infringed copyrights successful training and operating its AI systems.

Cloudflare has precocious started offering its customers nan action to automatically artifact each AI crawlers. To complement nan move to artifact AI crawlers, Cloudflare has besides launched its "Pay Per Crawl" program, enabling publishers to group rates for AI companies that want to scrape their content.

Also: 5 reasons why I still for illustration Perplexity complete each different AI chatbot

This follows galore deals successful which media businesses are permitting AI companies to legally usage their contented to train their ample connection models (LLMs). Examples see The New York Times pinch Amazon, The Washington Post pinch OpenAI, and Perplexity pinch Gannett Publishing.

In nan meantime, Perplexity appears to proceed to break nan rules successful its hunt for content. ZDNET has asked Perplexity astir Cloudflare's claims, but nan institution has not responded.

Want much stories astir AI? Check retired AI Leaderboard, our play newsletter.

More