EU probes Google over “illegal” web scraping to fuel its AI race

European regulators are escalating their confrontation with Silicon Valley’s AI ambitions, zeroing in on how Google built the data pipelines behind its most powerful models. At the heart of the new probe is a simple but explosive allegation: that Google turned the open web into a free training set, scraping publishers’ work at industrial scale without proper consent or compensation to catch up in the AI race.

By treating online content as raw fuel for its algorithms, Google now faces a test of whether the rules that governed search and digital ads can stretch to generative AI. The outcome will shape not only how tech giants train their systems, but also whether creators, newsrooms and platforms can still control how their work is used in an era of automated data harvesting.

The new EU case against Google’s AI data practices

The European Commission has moved from warnings to a formal antitrust investigation into how Google collects and uses online material to train its AI models. Regulators are examining whether the company’s approach to scraping websites, ingesting publisher content and tapping into its own platforms gives it an unfair edge over rivals that must negotiate licenses or build smaller datasets. In opening the case, The European Commission framed the question bluntly: whether Google’s conduct is distorting competition in AI by leaning on its dominance in search and other services.

According to the Commission, the inquiry focuses on whether Google’s access to vast troves of web pages, news articles and other digital content lets it train systems at a scale that competing AI developers cannot match, especially if those competitors must pay for data or respect stricter opt-out rules. The formal step, announced by The European Commission, explicitly raises the possibility that Google’s practices could leave alternative AI models at a disadvantage, turning the company’s historical dominance in search into a springboard for control over the next generation of AI tools.

“Illegal scraping” and the race to catch OpenAI

Behind the legal language sits a more visceral charge: that Google “illegally scraped” the web to fix its own AI shortcomings and close the gap with OpenAI. Reporting on the investigation describes how the company allegedly pulled in content from across the internet, including material from publishers and other rights holders, without clearly informing them or letting them meaningfully opt out. The suggestion is that Google’s scramble to keep pace with ChatGPT and other generative systems pushed it to treat the public internet as a default training corpus, regardless of the legal gray zones.

The probe also highlights the personal stakes for Google’s leadership, including chief executive Sundar Pichai, who has staked the company’s future on AI after years of criticism that it was slow to commercialize its research. Coverage by Kelvin Chan, credited as “Kelvin Chan and” alongside The Associated Press, underscores how regulators are scrutinizing whether Google’s efforts to ingest content at scale, including data from at least 44 jurisdictions, crossed legal lines in the rush to compete. The allegation that the company bypassed clear consent or robust opt-out mechanisms turns a technical data question into a potential test case for how far AI firms can go when they feel they are falling behind.

Publishers, YouTube and the question of consent

One of the most sensitive fronts in the investigation is Google’s relationship with publishers and video creators whose work underpins both its search engine and YouTube. Regulators are probing whether the company used content from web publishers and YouTube channels to train its AI systems without adequate permission, and whether it leveraged its control over distribution to pressure partners into accepting new terms. For newsrooms that already depend on Google for traffic, the idea that their articles might also be quietly feeding AI models raises both economic and editorial concerns.

The European Commission has signaled that it is particularly interested in whether Google’s use of publisher and YouTube material for AI training was tied to restrictive conditions that made it harder for others to train their own models. In its announcement, The European Commission said Tuesday that it was examining how Google may have used content from web publishers and YouTube creators in ways that limited rivals’ access to similar data to train their own models. If regulators conclude that Google tied access to its platforms to acquiescence on AI training, the case could reshape how platforms negotiate with media and creators over data rights.

Brussels zeroes in on unpaid web and YouTube content

For officials in Brussels, the core concern is not just that Google used online content, but that it did so without paying or offering meaningful control to those whose work was repurposed. The investigation is asking whether unpaid web pages and YouTube videos were effectively treated as a free raw material, even as they helped Google build AI products that could eventually compete with the very publishers and creators who supplied the data. That tension between free access and commercial reuse has long simmered in debates over search, but generative AI raises the stakes by enabling systems that can reproduce style, structure and substance at scale.

Regulators are also looking at whether Google’s control over search and video distribution allowed it to lock out rivals from similar data sources, creating a feedback loop in which dominance in one market reinforces power in another. Reporting from Brussels notes that officials are probing whether unpaid web and YouTube content, combined with potential lock-outs of competitors, amount to an abuse of dominance by giving Google a competitive advantage in AI training. Those concerns are captured in coverage of how Brussels is testing whether this pattern of behavior crosses the line from aggressive innovation into anticompetitive conduct.

AI search tools and the future of the results page

The investigation is not limited to how Google trains its models, it also reaches into how those models are deployed in search. European regulators are scrutinizing the company’s AI-powered search tools, which can generate direct answers, summaries and recommendations that sit above or instead of traditional blue links. The question is whether these features, built on data scraped from across the web, further entrench Google’s position by keeping users within its own interface and reducing the visibility of the sites whose content trained the system in the first place.

Critics argue that if AI-generated answers become the default, publishers could see traffic fall even as their work continues to feed the underlying models, a dynamic that could hollow out the economic base of independent media. The European Union’s competition arm has framed the new case as part of a broader push to ensure that the digital economy, including AI-enhanced search, remains open to challengers rather than locked into a single gatekeeper. That perspective is reflected in reporting on how Big Tech and American tech elites have criticized the European Union’s regulatory approach even as its competition arm presses ahead with a probe into Google’s AI search tools.

A pattern of EU actions against Google

The AI scraping probe does not come out of nowhere, it slots into a long history of European regulators challenging Google’s business practices. Earlier this year, the European Commission opened a formal antitrust investigation into Google’s role in the digital economy, signaling that AI would be the next frontier in a series of cases that have already reshaped how the company operates in search, shopping and advertising. That broader context matters, because it shows that Brussels sees the AI transition not as a clean slate, but as an extension of existing market power.

In a roundup of regulatory actions, coverage notes that Google’s AI investigation is listed as a top priority, with officials stressing that the goal is to ensure the market “is more competitive than ever.” The summary explains how, on December 9, the European Commission opened a formal antitrust investigation into Google, underscoring that, on December, the European Commission is treating AI as part of a continuum of enforcement rather than a separate policy silo. That continuity suggests that any remedies in the AI case could echo earlier decisions, from mandated changes to interfaces to structural remedies and fines.

How the new probe fits with other AI training investigations

The current case also overlaps with a separate line of inquiry into how Google uses online content for AI training more broadly. Regulators are examining whether the company’s extraction of data from websites, news outlets and other online services complies with EU competition rules, especially when those sources are already involved in lawsuits over copyright and data protection. The focus is not only on whether scraping occurred, but on whether the way it was integrated into Google’s AI pipeline created barriers for others who might want to build competing systems.

Reports on the probe describe how The European Commission has launched an investigation into whether Google may be breaching EU competition laws by extracting online content for AI training, even as related lawsuits remain ongoing in other jurisdictions. That framing, captured in coverage of how The European Commission will investigate Google over how it uses online content for AI training, highlights the interplay between competition law and intellectual property disputes. It also underscores that the EU is willing to move ahead on antitrust grounds even while courts elsewhere hash out copyright questions.

Billions in fines and a history of antitrust clashes

Google’s latest clash with Brussels lands on top of a record of multibillion-euro penalties that have already forced the company to adjust its business in Europe. Earlier this year, The European Commission fined Google €2.95 billion for breaching EU antitrust rules by distorting competition in the online advertising industry, known as adtech. That decision focused on how Google allegedly favored its own services over those of rivals, reinforcing its dominance in a market that underwrites much of the modern internet.

The new AI-focused probe raises the prospect that similar financial and behavioral remedies could follow if regulators conclude that Google’s data practices amount to an abuse of dominance. The scale of past penalties, including the €2.95 billion fine, signals that Brussels is prepared to impose significant costs when it believes competition has been harmed. In parallel, other jurisdictions have also scrutinized Google’s conduct, with Judge Leonie Brinkema of the US District Court for the Eastern District of Virginia determining that Google maintained its dominance in ways that favored its own services over those of rival offerings. Together, these cases paint a picture of a company whose every strategic move, including in AI, is now filtered through a lens of regulatory suspicion.

Why the AI probe matters for creators and competitors

For web publishers, YouTube creators and other rights holders, the EU’s investigation is about more than punishing past behavior, it is about setting the terms of engagement for the AI era. If regulators decide that scraping and reusing content without explicit consent or payment is incompatible with competition law, that could force Google and its peers to negotiate licenses, share revenue or offer more robust opt-out tools. Such changes would not only shift money and power back toward content producers, they could also slow the pace at which AI models are trained and updated, especially for companies that lack Google’s scale.

For smaller AI firms, the case could be equally consequential. A finding that Google’s access to data from search and YouTube constitutes an unfair advantage might lead to remedies that open up more datasets, limit exclusive deals or restrict how platform data can be repurposed for AI. Coverage of how The European Commission is probing Google’s use of online content for AI notes that regulators are explicitly asking whether the company’s practices have left web publishers at a disadvantage for AI purposes. If the answer is yes, the remedies could reshape not only Google’s AI roadmap, but the competitive landscape for every startup trying to build the next generation of models.

What comes next in Europe’s AI rulebook

The EU’s case against Google over alleged “illegal” scraping is emerging as a de facto test of how existing competition law can be applied to AI, even before bespoke AI regulations fully take hold. I see it as an early signal that regulators will not wait for new statutes to police how data is collected and reused, especially when the companies involved already hold dominant positions in adjacent markets. Instead, they are reaching for familiar tools, from abuse-of-dominance theories to structural remedies, and adapting them to the realities of machine learning and generative models.

For Google, the investigation forces a strategic choice between fighting on every front or moving toward negotiated settlements that could lock in new norms for AI training. For the broader industry, the message from Brussels is clear: the days of treating the open web as a consequence-free training set are over. As The European Commission presses ahead with its formal antitrust investigation into Google, the outcome will help determine whether AI’s next chapter is written primarily by engineers or by regulators who believe that innovation must be balanced with consent, compensation and genuine competition.

More from MorningOverview

IG

FB

PIN

LI

X