Image Credit: 最終編集 - CC BY-SA 4.0/Wiki Commons

For years, the volunteer-written encyclopedia that quietly underpins the modern web has been treated as a free buffet by artificial intelligence companies. Now Wikipedia is finally turning that dependence into a business, striking licensing deals that force the biggest AI players to pay for the data they once scraped without asking. The shift marks a rare moment when an open, nonprofit project is setting terms for Big Tech instead of simply absorbing the consequences.

From free buffet to paid pipeline

The core change is simple: if an AI company wants reliable, structured access to Wikipedia at industrial scale, it now has to sign a check. Wikimedia, the nonprofit that stewards the encyclopedia, has built an enterprise-grade feed that packages the site’s sprawling corpus into something machine learning teams can plug directly into their training pipelines. That product is now being sold to companies including Microsoft, Meta and Amazon, a group that has leaned heavily on Wikipedia to teach their models basic facts about the world and to power products like search and chatbots, rather than relying on uncontrolled scraping that batters the site’s infrastructure.

These arrangements formalize what had already become an open secret in AI circles: Wikipedia is one of the most valuable single sources of clean, human-vetted text on the internet. By charging for a dedicated firehose instead of tolerating unmetered harvesting, Wikimedia is asserting that this value is not just moral or cultural but economic. The organization has been explicit that revenue from these enterprise partnerships will help support the servers, bandwidth and engineering work that keep the encyclopedia online for everyone, a recognition that the cost of maintaining that infrastructure has risen sharply as AI training has intensified, and that the industry is finally being forced to confront what training data.

Who is paying, and what they get in return

The first wave of customers reads like a roll call of the current AI arms race. Wikimedia has announced deals with Microsoft, Meta, Amazon and Perplexity, a search startup that leans on large language models to answer user questions conversationally. In the same breath, the foundation has signaled that it is open to arrangements with other model builders, including companies such as France’s Mistral AI, which are racing to compete with the largest American platforms. These companies are not just buying raw text, they are paying for a stable, well-documented interface to Wikipedia’s constantly updated knowledge graph, something that is far more predictable than scraping millions of individual pages.

For the tech giants, the calculus is straightforward. Training and running models at the scale of Microsoft’s Copilot or Meta’s generative tools requires dependable, high-quality data that will not suddenly disappear behind a paywall or robots.txt file. By tying up with Wikipedia through formal enterprise deals, companies like Microsoft and Meta can assure their engineers that the encyclopedia’s content will remain available in a consistent format, while Amazon can feed its own AI systems with the same vetted information. The arrangement also gives Wikimedia leverage to set expectations around attribution and responsible use, a subtle but important shift from the era when AI developers simply scraped whatever they could find under the banner of Tech News.

“Chip in and pay your fair share”

Behind the contracts sits a blunt argument about fairness. Wikimedia leaders have been clear that while Wikipedia’s text remains free for humans to read, industrial users that hammer its servers to train commercial AI models should help shoulder the costs. One senior figure put it plainly, saying that the site’s infrastructure is not free and that if AI developers are going to lean on it at scale, they should probably chip in and pay for their fair share of the cost they are putting on the project. That line in the sand reflects a growing frustration with the way AI companies have treated the open web as a no-cost input, even as their valuations soar.

The tension has been building for more than a year. In a widely discussed blog post, the Wikimedia Foundation used the voice of Wikipedia itself to tell the AI industry to stop scraping and start subscribing, arguing that the crowd that built the modern web should not be treated as an invisible subsidy for proprietary models. That message, delivered on a Monday and framed as a defense of the volunteers who write and edit articles, was an early warning that the status quo would not hold forever. By the time the new licensing deals were unveiled, the foundation’s stance had hardened into a clear expectation that AI firms respect the community’s work and the nonprofit’s financial reality, a shift captured in the call for AI to pay.

That moral framing matters because Wikipedia is not a typical content vendor. Its articles are written and maintained by volunteers who are motivated by a sense of public service, not by licensing revenue. When Wikimedia argues that Big Tech should contribute financially, it is effectively saying that the labor of those volunteers, and the infrastructure that supports them, should not be treated as a free raw material for commercial AI systems that answer questions without always sending users back to the original source.

From “pillaged” to partner

The new deals are also an attempt to reset a relationship that many in the Wikipedia community felt had become extractive. AI companies had already ingested vast portions of the encyclopedia into their training sets, often without explicit permission, leading some observers to describe the process as Wikipedia being pillaged by AI companies. Now, instead of watching passively as models absorb its content, Wikimedia is signing agreements that turn those same firms into paying partners. The shift is not just symbolic, it is a recognition that the encyclopedia’s archive of over 65 million articles is a strategic asset in a world where high-quality text is suddenly scarce.

Those agreements extend beyond the household names. Alongside Microsoft, Meta and Amazon, Wikimedia has also brought in smaller but influential players like the search engine Ecosia and the question-answering startup Perplexity, both of which rely heavily on structured knowledge to power their products. By broadening the pool of partners, the foundation is trying to avoid a future in which only the largest corporations can afford reliable access to Wikipedia’s data. At the same time, it is sending a message to the rest of the AI ecosystem that the era of unpriced extraction is over, a message underscored by the framing of the new arrangements as a way to get paid by.

Protecting an open commons in the age of AI

For all the talk of contracts and revenue, the heart of the story is about preserving an open commons in a period of aggressive data collection. AI developers have been racing to scrape every accessible corner of the web, including Wikipedia’s vast repository of free knowledge, raising questions about consent, sustainability and the long-term health of the sites they depend on. As the encyclopedia marks its 25th birthday, Wikimedia is using the moment to argue that openness does not have to mean surrendering control, and that it is possible to welcome AI companies as customers without abandoning the principle that anyone can still read the articles for free.

That balance is delicate. Wikimedia has stressed that the new enterprise deals are meant to support the infrastructure that keeps the volunteer project online, not to wall off content behind paywalls or block AI companies outright. The site has become so popular in part because it is free for anyone to use, but its leaders are equally blunt that the infrastructure is not free and that the way AI companies access the content will likely evolve in the future. In practice, that means continuing to serve ordinary readers while offering specialized, paid access for high-volume machine users, a model that reflects both the scale of modern AI training and the need to protect Wikipedia’s infrastructure.

More from Morning Overview