Morning Overview

Google blasts rivals for ‘stealing’ AI it built by scraping everyone’s data

Google has escalated its fight over who gets to profit from the web’s data, filing a lawsuit that accuses rival SerpApi of “stealing” AI-ready information by scraping Google Search at massive scale. In a sharply worded statement, Google’s General Counsel framed the case as a defense of users, websites, and the company’s own investments against what it calls “stealthy scrapers” that ignore basic rules of the internet. At stake is a core question for the AI boom: when search results are built from everyone’s content, who is allowed to mine those results to train and power competing AI products?

The Lawsuit’s Core Allegations

In its official announcement, Official Google says SerpApi has been running an industrial-scale scraping operation that targets Google Search, sending automated requests that the company claims reach “hundreds of millions” of queries per day. Google alleges that SerpApi uses cloaking techniques, bot networks, and fake user agents to disguise this activity, and that it then sells structured Google Search results to downstream customers, including AI developers that want ready-made training or inference data. According to Google’s General Counsel, the company views this as a direct attack on its rights in the data it curates and the technical measures it deploys to protect its services.

Google’s complaint, as summarized in independent coverage, describes SerpApi as a business that is “built on scraping Google Search” and reselling that output, a characterization repeated in reporting that notes the claimed volume of automated traffic and the AI-focused marketing of SerpApi’s tools. One detailed account of the filing explains that Major reports Google accuses SerpApi of routing traffic through large botnets to evade detection and of systematically overriding technical limits that are supposed to throttle automated queries. SerpApi has publicly disputed aspects of Google’s narrative in other forums, but the reporting available here does not include a full on-record denial of each allegation, and there is limited independent verification of the exact scraping volume beyond Google’s own figures.

Google’s Defense of Its Data Practices

To justify its hard line, Google is leaning on a familiar argument: that it follows the internet’s long-standing protocols while rivals do not. In its blog post, Google stresses that its own crawlers respect robots.txt and related machine-readable instructions that websites use to signal what can be indexed or mined. The General Counsel says the company expects others to play by the same rules, casting SerpApi and similar services as “stealthy scrapers” that ignore robots.txt, rotate identities, and bypass rate limits in ways that ordinary developers and researchers do not. Google also frames its lawsuit as part of a broader effort to protect users from abuse of its infrastructure, arguing that massive scraping can degrade service quality and undermine security features.

Regulators, however, have already questioned how Google uses its gatekeeper position in search, which complicates its attempt to claim the moral high ground on data access. In a separate proceeding, the Primary EU document detailing the Commission’s preliminary findings against Alphabet alleges that Google Search design choices and Google Play steering restrictions can disadvantage rivals. The Commission’s view that Google Search may already be structured in ways that favor its own services provides important context for the scraping fight: Google is arguing that others are unfairly taking from a system regulators say it may already be using to tilt the market. That tension is likely to shape how courts and policymakers interpret Google’s claim that it is simply enforcing basic rules of the road.

Broader AI Scraping Wars

Google’s SerpApi case lands amid a flurry of related lawsuits that show how tangled AI scraping disputes have become. One of the most striking examples involves Reddit, which has sued to block AI startup Perplexity from using data that Reddit says was scraped indirectly via Google Search results. According to that Accountability report, Reddit’s complaint claims Perplexity tried to evade Google’s anti-scraping system, which Reddit says is internally called “SearchGuard,” by targeting cached and proxied copies of Reddit content visible through Google Search rather than hitting Reddit directly. Reddit argues that some AI firms license access to its data, while others rely on scraping without consent, and the lawsuit is an attempt to draw a bright line between those approaches.

The Accountability reporting goes further, describing how Reddit allegedly used a “marked bills” style bait test to catch what it says was unauthorized scraping routed through Google’s infrastructure, and referencing subpoenas that seek to uncover the scale and mechanics of that activity. At the same time, Major coverage of the dispute emphasizes that Perplexity has denied some of Reddit’s claims and that there is still uncertainty over how much of the scraped material ultimately flowed into downstream AI systems. This uncertainty mirrors the SerpApi case: Google alleges that SerpApi’s scraped results power AI applications, but the exact ways those customers use the data, and whether those uses infringe any particular rights, remain largely opaque in the public record.

Regulatory Backdrop in the EU

While companies trade accusations in court, European regulators are trying to set ground rules for who can access search data and on what terms. In a key proceeding opened under the Digital Markets Act, the Primary regulator document explains that the Commission is examining whether Google is complying with its obligations to share anonymised Google Search ranking, query, click, and view data with third parties on fair, reasonable and non-discriminatory, or FRAND, terms. That document explicitly mentions the eligibility of AI chatbot providers to receive such data, highlighting that the same search index Google wants to shield from “stealthy scrapers” is also subject to mandatory data sharing duties because of its gatekeeper status.

In parallel, the Commission is working on rules that govern how AI developers handle text and data mining rights in the first place. A Primary consultation on protocols for reserving rights for TDM under the AI Act and the GPAI Code of Practice sets out expectations that model providers must respect machine-readable reservations, including robots.txt and similar standards. According to that TDM document, the goal is to ensure that content creators can signal when their works are off-limits for AI training and that GPAI developers have clear obligations to honor those signals. This regulatory push aligns with Google’s emphasis on robots.txt in its lawsuit, but it also applies equally to Google itself when it trains and deploys its own general purpose AI systems.

Google Search, Self-Preferencing and AI Overviews

The EU’s scrutiny of Google does not stop at data sharing and TDM. In its preliminary findings under the Digital Markets Act, the Commission has signaled concern that Alphabet uses Google Search to self-preference its own services and that Google Play may include steering restrictions that limit rival apps. While this case is not directly about scraping, it speaks to the power of Google Search as a distribution channel, and to regulators’ suspicion that the company can design search features, including AI-driven answers, in ways that shape which services users see and which publishers receive traffic.

Publishers are already testing that theory in court. Education company Chegg has sued Google, alleging that AI-generated Overviews in Google Search siphon away traffic and revenue by summarizing Chegg’s content while keeping users on Google’s page. According to that Context reporting, Chegg argues that it is effectively forced to supply content into Search to remain visible to students, even as those same AI Overviews reduce the incentive for users to click through. This flips the narrative that appears in Google’s SerpApi complaint: while Google accuses rivals of taking value from its search index, Chegg accuses Google of taking value from its content and repackaging it inside Google’s own AI features.

The GPAI Code of Practice and Enforcement Timelines

As these disputes multiply, the EU is trying to operationalize its new AI rules with a specific focus on general purpose systems. The Primary Commission announcement that the GPAI Code of Practice is now available sets out how model providers are expected to handle training data transparency, rights reservations, and risk management ahead of the AI Act’s GPAI rules entering into application. According to that Code of Practice document, GPAI developers that sign up are expected to disclose high-level information about their training data sources, respect TDM reservations, and cooperate with regulators as the Act’s enforcement phase begins. These expectations directly intersect with the kind of scraping and data reuse at issue in Google’s SerpApi lawsuit.

The same announcement explains that the Code of Practice is intended as a bridge to full AI Act enforcement, giving companies time to adapt their processes before binding obligations apply. For AI firms that rely on search results, whether from Google or other platforms, that means they will need to reconcile their data-gathering methods with both the TDM protocols and the FRAND-based access regimes described in the Primary interoperability and search data sharing proceedings. The result is a complex legal environment where the same dataset might be off-limits for scraping, mandatory for sharing on FRAND terms, and subject to AI-specific transparency rules, depending on who is accessing it and why.

What This Means for AI Innovation

For AI developers, the message from both Google’s lawsuit and the EU’s regulatory agenda is that the era of frictionless scraping is ending. Companies that once quietly pulled data from Google Search or Reddit to train chatbots now face litigation risk, TDM reservations, and potential DMA enforcement if they are seen as bypassing FRAND-based channels. At the same time, the TDM protocols and the GPAI Code of Practice suggest that compliant access paths will exist, provided model providers are prepared to attribute sources, respect opt-outs, and document their training inputs. Experts quoted in the EU materials warn that ignoring these emerging norms could expose AI firms to both legal penalties and loss of trust from content creators whose works underpin modern models.

For content creators and platforms, the stakes are equally high. Lawsuits like Chegg’s against Google and Reddit’s action over scraping via Google Search reflect a growing sense that AI systems are extracting value without adequate compensation or control. Yet Google’s own case against SerpApi shows that even dominant platforms feel vulnerable when others mine their curated results to build rival AI products. As the AI Act’s GPAI provisions move toward enforcement, and as DMA proceedings on search data sharing progress, innovation in AI may hinge less on who can grab the most data and more on who can navigate this dense web of rights, obligations, and technical safeguards.

Unresolved Questions and Next Steps

Despite the flurry of filings and policy papers, many core questions remain unsettled. Google’s General Counsel labels SerpApi’s activities as theft of AI-ready data, yet SerpApi and other intermediaries are likely to argue that they are simply automating access to information users can already see in Google Search. Courts will have to weigh how far contractual terms, technical measures, and TDM reservations can go in restricting reuse of search results, and where fair use or similar doctrines might still apply. In parallel, the Accountability reporting on Reddit’s “SearchGuard” allegations highlights how little is publicly known about the subpoenas and internal tests that aim to prove scraping at scale, leaving significant evidentiary gaps that will only be filled as cases move through discovery.

Regulators are still refining their own tools as well. The GPAI and TDM consultations invite feedback on how rights reservations should work in practice, while the interoperability and online search data sharing proceedings will shape whether AI chatbot providers can claim FRAND-based access to anonymised Google Search data instead of scraping it. Until those processes conclude and the AI Act’s GPAI rules are fully in force, both Google and its rivals are operating in a grey zone where lawsuits and regulatory probes substitute for clear, settled law. The outcome will determine whether Google’s attempt to brand large-scale scraping as “stealing” becomes a legal benchmark for the AI era or just another opening argument in a much longer fight.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.