Brett Jordan/Pexels

Reddit, the popular online platform known for its user-generated content, has taken legal action against four companies, alleging unauthorized use of its content. The defendants, Perplexity AI, Cloudflare, Adversa AI, and Bright Data, are accused of scraping and using Reddit’s content to train their AI models without permission or compensation. The lawsuit, seeking damages exceeding $100 million, was filed in the U.S. District Court for the Northern District of California on October 23, 2024.

Background on Reddit’s Content Deals

Reddit’s lawsuit comes in the wake of its recent partnerships with AI companies such as OpenAI, Google, and Anthropic. These partnerships, which began in 2024, involve paid licensing for content access and are estimated to generate over $60 million annually for Reddit. The company has taken a firm stance on protecting its user content, implementing policy changes in June 2024 that restrict automated data collection. This move was prompted by an observed increase in scraping activities by AI firms.

Reddit CEO Steve Huffman emphasized the company’s commitment to safeguarding its community during the lawsuit announcement. He stated, “We’re taking legal action to protect our community and ensure that companies respect our terms of service.”

Details of the Allegations Against Perplexity AI

Perplexity AI, one of the defendants, allegedly scraped millions of Reddit posts using hidden web crawlers. This activity is said to violate Reddit’s robots.txt file and terms of service, as detailed in the lawsuit. Perplexity’s AI search engine, launched in 2022, is accused of generating responses by summarizing Reddit content without attribution or compensation. This has reportedly led to over 1,000 unauthorized API calls per minute detected by Reddit.

Reddit’s investigation has revealed evidence suggesting that Perplexity’s models were trained on scraped data. This resulted in the direct reproduction of Reddit threads in AI outputs, further strengthening Reddit’s case against the company.

Involvement of Cloudflare and Its Role

Cloudflare, another defendant in the lawsuit, is accused of facilitating the scraping activities by providing infrastructure services to Perplexity. According to the lawsuit filing, Cloudflare’s Workers AI platform hosted scraping scripts used by Perplexity. Despite receiving cease-and-desist notices from Reddit in July 2024, Cloudflare allegedly continued to enable traffic from Perplexity’s IP addresses linked to data collection in San Francisco.

Cloudflare CEO Matthew Prince responded to the lawsuit, stating that while the company complies with legal DMCA requests, it views the suit as an overreach.

Adversa AI and Bright Data’s Alleged Actions

Adversa AI, another company named in the lawsuit, is accused of using Reddit data for training its “Attack Success Rate” AI model. The company’s scraping operations, which reportedly began in early 2023, were traced to servers in London, UK. Bright Data, the fourth defendant, is alleged to have acted as a data broker, selling scraped Reddit datasets to Perplexity and others. The company reportedly generated $50 million in revenue from such activities in 2023.

The lawsuit claims that both Adversa AI and Bright Data bypassed Reddit’s rate limits, extracting over 500 million comments without consent.

Legal Precedents and Broader Implications

This lawsuit is not without precedent. In 2023, The New York Times filed a similar lawsuit against OpenAI for copyright infringement in AI training, seeking $1 billion in damages. If successful, Reddit’s lawsuit could have significant outcomes, including injunctions to halt scraping and statutory damages up to $150,000 per infringed work under U.S. copyright law.

The industry impact of this lawsuit could be substantial. It could pressure AI startups to negotiate licensing deals, following Reddit’s model with payments tied to user growth metrics.

Reddit’s Strategy and Future Steps

Reddit has developed internal monitoring tools to detect scraping activities. These tools identified patterns from the defendants’ operations across 100,000 subreddits, providing crucial evidence for the lawsuit. The lawsuit was filed in San Francisco, where Perplexity is headquartered, allowing Reddit to leverage California jurisdiction for faster enforcement.

The initial hearing for the case is scheduled for December 15, 2024. Reddit has expressed its intent to expand the lawsuit if similar violations are found in the future.

More from MorningOverview