Reddit is suing Perplexity for scraping posts, escalating its battle over person knowledge with the AI trade
Thomas Fuller | Light rocket | Getty Images
Social media giant Reddit has filed a lawsuit against artificial intelligence company Perplexity, alleging that it illegally deleted user posts to train its AI model. This marks the latest data rights conflict between content owners and the AI industry.
The lawsuit, filed Wednesday in New York federal court, also named three defendants who Reddit says helped Perplexity collect its data: Lithuanian data scraper Oxylabs, “former Russian botnet” AWMProxy and Texas startup SerpApi.
Reddit claimed that the three smaller companies were able to extract their copyrighted content “by disguising their identities, concealing their locations, and disguising their web scrapers as regular people.”
Perplexity, which operates an AI-powered search engine, denied the allegations and accused Reddit of “extortion” and opposition to an open internet, while SerpApi told CNBC that it “strongly disagrees” with Reddit’s claims and plans to defend itself in court.
The case represents one of many filed by content owners accusing AI companies of using copyrighted material without permission to train their large language models. Reddit, in particular, has been at the forefront of this fight, having filed a similar ongoing lawsuit against AI startup Anthropic in June. CNBC was unable to reach Oxylabs and AWMProxy.
In a statement shared with CNBC, Ben Lee, chief legal officer at Reddit, said that AI companies are “caught in an arms race for high-quality human content” and that the pressure has fueled an “industrial-scale data laundering economy.”
Scrapers circumvent technological protection measures to steal data and then sell it to customers hungry for training materials. Reddit is a prime target because it is one of the largest and most dynamic collections of human conversation ever created.
Reddit — which hosts over 100,000 interest-based “subreddit” communities — said in its lawsuit that its user posts had become the most cited source of AI-generated responses on Perplexity.
It added that it sent Perplexity a cease-and-desist letter, after which it “increased the volume of citations to Reddit by 40-fold.”
AI researchers have previously found that the large amount of moderated conversations on Reddit can help AI chatbots provide more natural-sounding answers.
In the age of artificial intelligence, Reddit has worked to leverage its vast pool of data and allow access to it only through AI-related licensing agreements. The social media company has signed such agreements with OpenAI and alphabetis Google.
In response to the lawsuit, Perplexity argued in a post on the Reddit platform that it does not train AI models on content, but merely summarizes and cites public Reddit discussions. Therefore, it is “impossible” to sign a license agreement.
“A year ago, after explaining this, Reddit insisted that we would still pay, even though we were lawfully accessing Reddit data. Bowing to strong-arm tactics is simply not the way we do business,” the statement said, before describing the lawsuit as a “show of force in Reddit’s training data negotiations with Google and OpenAI.”
“Perplexity believes this is a sad example of what happens when public data becomes a large part of a publicly traded company’s business model,” Perplexity added, noting that data licensing has become an increasingly important source of revenue for Reddit.
In February, Reddit COO Jen Wong told trade publication Adweek that AI licensing deals with Google and OpenAI accounted for nearly 10% of Reddit’s revenue.
Comments are closed.