Reddit Sues Perplexity And Three Other Companies For Allegedly Using Its Content Without Paying

Trending 1 month ago

Reddit is suing companies SerApi, OxyLabs, AWMProxy and Perplexity for allegedly scraping its information from hunt results and utilizing it without a license, The New York Times reports. The caller suit follows legal action against AI startup Anthropic, who allegedly utilized Reddit contented to train its Claude chatbot.

As of 2023, Reddit charges companies looking entree to posts and different contented successful nan hopes of making money connected information that could beryllium utilized for AI training. The institution has besides signed licensing deals pinch companies for illustration Google and OpenAI, and moreover built an AI reply instrumentality of its own to leverage nan knowledge successful users' posts. Scraping hunt results for Reddit contented avoids those payments, which is why nan institution is seeking financial damages and a imperishable injunction that prevents companies from trading antecedently scraped Reddit material.

Some of nan companies Reddit is focused on, for illustration SerApi, OxyLabs and AWMProxy, are not precisely family names, but they've each made collecting information from hunt results and trading it a cardinal portion of their business. Perplexity's inclusion successful nan suit mightiness beryllium much obvious. The AI institution needs information to train its models, and has already been caught seemingly copying and regurgitating worldly it hasn't paid to license. That besides includes reportedly ignoring nan robots.txt protocol, a measurement for websites to pass that they don't want their worldly scraped.

Per a transcript of the lawsuit provided to Engadget, Reddit had already sent a cease-and-desist to Perplexity asking it to extremity scraping posts without a license. The institution claimed it didn't usage Reddit data, but it besides continued to mention nan level successful answers from its chatbot. Reddit says it was capable to beryllium Perplexity was utilizing scraped Reddit contented by creating a "test post" that "could only beryllium crawled by Google’s hunt motor and was not different accessible anyplace connected nan internet." Within a fewer hours, queries made to Perplexity's reply motor were capable to reproduce nan contented of nan post.

"The only measurement that Perplexity could person obtained that Reddit contented and past utilized it successful its 'answer engine' is if it and/or its co-defendants scraped Google [search results] for that Reddit contented and Perplexity past quickly incorporated that information into its reply engine," nan suit claims.

When asked to comment, Perplexity provided nan pursuing statement:

Perplexity has not yet received nan lawsuit, but we will ever conflict vigorously for users’ authorities to freely and reasonably entree nationalist knowledge. Our attack remains opinionated and responsible arsenic we supply actual answers pinch meticulous AI, and we will not tolerate threats against openness and nan nationalist interest.

This caller suit fits pinch nan fierce stance Reddit has taken towards protecting its data, including rate-limiting chartless bots and web crawlers successful 2024, and even limiting what entree nan Internet Archive's Wayback Machine has to its tract successful August 2025. The institution has besides sought to specify caller position astir really websites are crawled by adopting nan Really Simple Licensing standard, which adds licensing position to robots.txt.

More