A federal judge has drawn a line that the artificial intelligence industry has been dreading: training an AI model on pirated books is not protected by fair use, no matter how many legal copies a company buys after the fact. The ruling in Bartz et al. v. Anthropic PBC (N.D. Cal., case no. 3:24-cv-05417) landed alongside a proposed $1.5 billion settlement, the largest AI-related copyright resolution on record, and together they signal that the days of treating training data as a legal afterthought are finished.
What the court actually decided
The order split the case down the middle. According to Associated Press reporting, the judge found that using copyrighted books to train an AI system can qualify as fair use under federal law, giving Anthropic a partial victory on the broadest legal question in the dispute. But the same order held that fair use does not retroactively sanitize the act of obtaining pirated material. Buying a licensed edition later does not erase the original infringement.
That distinction between how data is acquired and how it is used gives the case its force. AI companies have long argued that ingesting copyrighted text is transformative because the resulting model does not reproduce the source verbatim. The court accepted that reasoning in part. Yet by holding that sourcing pirated copies remains independently actionable, the judge drew a boundary between lawful curation and unlicensed scraping that plaintiffs in future cases are likely to invoke. Anthropic now faces trial on the piracy-related claims even as it relies on the fair use finding to defend its broader training methods.
The opinion also breaks AI training into discrete legal steps: collection, copying, transformation, and output. Treating each step on its own terms means a company can prevail on fair use at one stage and still lose at another. That framework gives future plaintiffs a roadmap: if you can show your work was scraped from an illicit repository rather than a licensed database, the fair use shield may not apply.
A $1.5 billion settlement followed within days
Shortly after the ruling, the financial stakes became concrete. A judge granted preliminary approval to a $1.5 billion copyright settlement between Anthropic and the plaintiff authors, according to the Associated Press. Washington Post reporting corroborated the settlement amount and noted the deal reportedly provides roughly $3,000 per book covered by the agreement. For individual writers, that figure may look modest against the scale of Anthropic’s business, but the total payout dwarfs every prior AI copyright resolution combined.
The speed of the settlement speaks volumes. Rather than risk a jury verdict on whether it knowingly used pirated material, Anthropic moved to resolve the claims before trial. That calculus suggests the company viewed its exposure on the piracy question as a serious financial and reputational threat, not a procedural nuisance.
Preliminary approval, however, is only the first gate. Class members will have the opportunity to opt out, file objections, or submit claims before the court decides whether to grant final approval. Until that happens, the $1.5 billion figure remains a proposed resolution, and the practical value for any individual author will depend on how the settlement fund is administered and how many works ultimately qualify.
Why every other AI lab should be paying attention
The practical fallout reaches far beyond a single defendant. OpenAI, Google DeepMind, and Meta have all faced similar accusations about the origins of their training corpora. If the fair use framework holds for lawfully obtained material but collapses for pirated sources, the legal risk pivots almost entirely to provenance: where did the data come from, and can the company prove it? The federal docket for Bartz v. Anthropic is already being cited by plaintiffs’ attorneys in parallel lawsuits.
For publishers and literary agents, the ruling creates leverage that contract negotiations alone could not deliver. If AI companies cannot safely rely on pirated datasets, they need licensed alternatives, and they need them quickly. Several direct licensing deals between AI labs and major publishing houses have been announced in recent months, but the Bartz decision adds a judicial enforcement mechanism: a company that declines to license now faces not only public criticism but also the prospect of statutory damages and injunctions tied to its data pipelines.
AI labs, meanwhile, may respond by auditing legacy datasets and tightening chain-of-custody records. Knowing where every training file originated, and being able to document that trail, becomes essential if courts treat pirated sources as a bright-line violation. Some firms may conclude it is cheaper to rebuild training corpora from licensed or public-domain materials than to defend scraping practices that predate the current wave of litigation.
Open questions the ruling does not answer
Several gaps remain. The relationship between the trial-ready piracy claims and the proposed settlement is not fully spelled out in publicly available filings. Whether the settlement resolves all claims or leaves certain issues open for further litigation will become clearer once the court publishes the detailed terms of the agreement.
The methodology behind the roughly $3,000 per-book figure is also unexplained in public reporting. That number could reflect a flat rate, a formula tied to sales data, or a negotiated average across a large and varied class of works. Authors weighing whether to participate or object will want to see the allocation formula before the claims deadline.
The case also exists in a broader legal landscape that is still taking shape. The New York Times v. OpenAI lawsuit, filed in late 2023, raises overlapping but distinct questions about whether AI outputs that closely paraphrase source material constitute infringement. And the Thomson Reuters v. Ross Intelligence decision, one of the few earlier rulings to address AI training and fair use, reached a narrower conclusion on a smaller dataset. How courts reconcile these cases over the coming months will determine whether Bartz becomes a lasting precedent or an outlier.
What authors and developers should do before the claims deadline
Writers whose books may have been used to train AI models should watch for three procedural milestones: the final fairness hearing on the settlement, the deadline for claims and objections, and the court’s clarification of which works and authors are covered. Each of those steps will shape whether the offered compensation and any promised safeguards on future use justify giving up an individual right to sue.
AI developers should focus on how the court describes unlawful data acquisition in any forthcoming written opinions. If the judge articulates a standard that treats scraping from known piracy repositories as categorically infringing, companies will need to demonstrate that their crawlers avoided those sources or face similar claims. Internal documentation about data sourcing, once a back-office concern, is now front-line trial evidence.
A single federal lawsuit has produced a fair use ruling that partially validates current training practices, a piracy finding that threatens unlicensed data pipelines, and a proposed settlement measured in billions. For both the people who write books and the companies that feed them to machines, the legal reckoning is no longer theoretical. It is on the docket, and the clock is running.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.