Fed Up With Ai Scraping Your Content? This Open-source Bot Blocker Can Help - Here's How

Trending 1 week ago
 The deity of AI bot blockers
Anubis / Elyse Betters Picaro / ZDNET

Anyone who runs a website knows really annoying AI bots are these days. 

F5, nan exertion transportation web company, recovered that much than half of each web visits travel not from group but from information scrapers, including OpenAI, Anthropic, Google, and Perplexity AI bots. 

Also: AI bots scraping your data? This free instrumentality gives those pesky crawlers nan run-around

(Disclosure: Ziff Davis, ZDNET's genitor company, revenge an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful training and operating its AI systems.)

Stopping AI bots successful their tracks

People are sick and tired of wasting money connected their sites only to person AI companies rip disconnected everything of value. So, Xe Iaso, a method pedagogue and part-time bot fighter, wrote an open-source program, Anubis, to extremity AI bots successful their tracks.

Anubis isn't nan only specified program. Indeed, Iaso freely admits it's "basically nan Cloudflare. Are you a bot? [aka Cloudflare Turnstile] page, but self-hostable." This enables you to tally it connected your ain server without incurring immoderate fees.

Also: How AI companies are secretly collecting training information from nan web (and why it matters)

Anubis is designed to protect websites -- peculiarly those tally by mini organizations, open-source projects, and archives -- from nan relentless onslaught of automated scrapers that frighten to overwhelm servers and summation hosting costs. The programme is simply a web AI firewall utility. All incoming HTTP connections must successfully walk done it earlier reaching your existent website.

Tongue successful cheek, Iaso describes Anubis arsenic for illustration nan ancient Egyptian god, weighing nan psyche of your relationship utilizing 1 aliases much challenges to protect upstream resources from scraper bots. It does this by requiring visitors to lick a computational puzzle, which is trivial for PCs but costly for bots operating astatine scale. 

The strategy checks whether visitors behave for illustration existent browsers, utilizing JavaScript and cookies to verify authenticity. When a bot fails these challenges, it is blocked earlier reaching nan website's halfway resources.

Anubis is an uncaptcha

Now, you whitethorn ask, "Isn't this conscionable a CAPTCHA? And, aren't AI programs conscionable arsenic bully astatine solving those arsenic group are?" 

That's existent -- they are. But arsenic Iaso says, "Anubis is an uncaptcha. It uses features of your browser to automate a batch of nan activity that a CAPTCHA would, and correct now, nan main implementation is by having it tally a bunch of cryptographic mathematics pinch JavaScript to beryllium that you tin tally JavaScript successful a measurement that tin beryllium validated connected nan server."

Also: Cloudflare conscionable changed nan internet, and it's bad news for nan AI giants

She is good alert that galore group are hesitant to tally JavaScript owed to information and privateness concerns. She's moving connected a non-JavaScript type of Anubis, but it's not present yet. It will beryllium a while. On a Reddit thread, Iaso said she's "am moving connected a amended 1 that doesn't trust connected JS, but ohio god, it is going to beryllium a hellhole of a point to implement."

Anubis is written successful Go and licensed nether nan open-source MIT License. It's designed to beryllium "as lightweight arsenic imaginable to guarantee that everyone tin spend to protect nan communities closest to them." On average, nan programme uses little than 128 MB of RAM connected nan server side.

Most of nan workload is handled by visitors' PCs and smartphones. Still, nan end-user processing load is truthful debased that mean users won't notice. Indeed, since Anubis operates transparently, location are nary CAPTCHAs to lick aliases images to click; astir group won't moreover cognize that thing is happening. 

Also: This proxy supplier I tested is nan champion for web scraping - and it's not IPRoyal aliases MarsProxies

The proof-of-work runs successful nan background, and only those pinch outdated browsers aliases JavaScript abnormal whitethorn brushwood issues. It's different communicative for bot farms -- their load quickly adds up.

In a blog, Iaso says:

At a precocious level, Anubis has a large aged group of rules successful your bot argumentation file. If clients lucifer a rule, they are either passed through, blocked, aliases selected for secondary screening. By default, Anubis is meant to instantly activity by stopping each nan bleeding and letting administrators slumber without downtime alerts waking them up. This intends that it's overly paranoid and aggressively challenges everything, akin to Cloudflare's "I'm nether attack" mode.

My intent was that admins would commencement retired pinch Anubis being rather paranoid and past slow lessen nan paranoia arsenic they find amended patterns and lucifer retired ways to do things. Users thin to usage Anubis successful its default configuration, but this default configuration interferes pinch RSS provender readers and different "good bots."

A atomic response

The consequence is simply a instrumentality that Iaso describes arsenic a "bit of a atomic response."

"This will consequence successful your website being blocked from smaller scrapers and whitethorn inhibit 'good bots' for illustration nan Internet Archive. You tin configure bot argumentation definitions to explicitly allowlist them, and we are moving connected a curated group of 'known good' bots to let for a discuss betwixt discoverability and uptime," Iaso says.

Also: Reddit sues Anthropic for scraping its users' contented without consent

Many groups were fresh for a atomic response. Organizations specified arsenic GNOME, FFmpeg, and UNESCO person adopted Anubis to protect their online infrastructure. Since its merchandise successful January 2025, Anubis has been downloaded complete 200,000 times and is credited pinch helping galore organizations debar outages and trim nan load of unwanted AI scraping.

According to Duke University, a happy Anubis user, nan school's room systems person successfully blocked astir 90 percent of unwanted traffic and complete 4 cardinal unwanted HTTP requests per day, while improving work capacity pinch minimal blockage for existent users.

How to instal and tally Anubis

There are several ways to instal and tally Anubis. 

Typically, Anubis is meant to beryllium betwixt your reverse proxy and your target service. Support is presently free. You tin entree it via its GitHub rumor page or, for unrecorded chat, subordinate Iaso's Patreon and inquire successful nan Patreon Discord channel. There's besides a commercialized type of Anubis named BotStopper, which, astatine this point, conscionable offers organizations much power complete nan program's branding.

Also: How world threat actors are weaponizing AI now, according to OpenAI

The conflict betwixt bot developers and defenders promises to beryllium never-ending. Anubis's creators are updating nan instrumentality to antagonistic caller evasion tactics, specified arsenic headless browsers and precocious browser fingerprinting. The extremity is to support nan net accessible for humans while making it uneconomical for abusive bots to run astatine scale. This is not easy. 

If you find nan task useful, do support it. She tin usage each nan thief you tin give.

Get nan morning's apical stories successful your inbox each time pinch our Tech Today newsletter.

More