
Adobe is facing a high‑stakes legal and reputational test as a proposed class action accuses the company of quietly feeding authors’ books into its artificial intelligence training pipeline. At the center of the dispute is SlimLM, a text model that plaintiffs say was built on pirated literature, raising pointed questions about how one of the most influential creative software makers treats the very writers and artists who rely on its tools.
The lawsuit lands after months of public backlash over Adobe’s data practices and Terms of Use, turning a simmering trust problem into a direct courtroom fight over copyright, consent, and the future of AI in creative work. I see this case as a bellwether for how aggressively courts will police training data and how far tech companies can go in repurposing user and third‑party content for machine learning.
The lawsuit that put SlimLM under a microscope
The proposed class action was filed in Calif by a writer who alleges that Adobe infringed her copyrights when it trained an AI model on her books without permission. According to the complaint, the plaintiff seeks to represent a broader class of authors whose works were allegedly swept into the same training corpus, arguing that Adobe’s integration of the model into its document creation tools turned unlawful data collection into a commercial product. The filing, described in detail in a report that notes it was By Dorothy Atkins, frames the dispute not as a narrow metadata issue but as a systemic misuse of literary works.
Central to the case is the allegation that Adobe relied on pirated books to train SlimLM, a model that now underpins features marketed to writers, marketers, and office workers. The complaint contends that this training regime violated the exclusive rights of authors and that any outputs derived from those works are tainted by the initial infringement. In describing the scale of the alleged copying, the filing points to a training set that included at least 54 titles from the named plaintiff, using that figure of 54 to illustrate how deeply one author’s catalog alone may have been mined for AI development.
Claims of pirated books and the SlimPajama-627B dataset
At the heart of the factual dispute is where SlimLM’s knowledge actually comes from. Plaintiffs argue that Adobe built the model on a trove of pirated books, effectively turning unauthorized e‑book collections into raw material for a commercial AI system. They say this practice allowed the company to shortcut the expensive process of licensing or commissioning training data, while exposing authors to uncompensated reuse of their work in a context they never agreed to. One detailed account of the complaint describes how the plaintiffs accuse Adobe of using such pirated books to train SlimLM and then deploying the model in ways that directly compete with human writers’ services, a claim echoed in coverage that highlights how Adobe faces class action lawsuit pressure over its AI training choices.
Adobe, for its part, has publicly rejected the idea that it raided pirate libraries, instead pointing to SlimPajama-627B, an open‑source dataset released by Cerebras in mid‑2023, as the primary source of SlimLM’s training data. The company says that SlimPajama-627B was curated from a mix of web and text sources and that it relied on that dataset to avoid directly scraping or ingesting proprietary content. Reporting on the case notes that Adobe has explicitly identified SlimPajama-627B and its origin with Cerebras, while also acknowledging that open‑source corpora can wander into legally questionable territory when they aggregate material from the broader internet. That tension, between Adobe’s reliance on an open dataset and the plaintiffs’ insistence that pirated books were involved, is captured in an analysis of how Adobe says the model was trained and why that still may not insulate it from copyright claims.
How authors say their works ended up in the training data
The named plaintiff, identified as Lyon in one detailed account, alleges that her own writing was included in the training data without her consent, credit, or compensation. According to the complaint, Lyon’s books appeared in the same online ecosystems where pirated copies of commercial titles circulate, and those copies were then swept into the SlimPajama-627B dataset that Adobe used to build SlimLM. The filing argues that this chain of events does not absolve Adobe, because the company allegedly chose to rely on a dataset that contained infringing material and then profited from the resulting model. One report notes that, according to the complaint, Lyon’s own writing was included in the training data without her consent, underscoring how individual authors can trace a direct line from their books to the AI system at issue, a point laid out in coverage that begins, According to the complaint, Lyon’s works were part of the dataset.
In that same account, Adobe is described as characterizing SlimLM as a model trained on SlimPajama-627B, while the plaintiffs argue that the presence of pirated books in that dataset makes its use unlawful. The complaint effectively asks the court to treat the dataset itself as tainted, asserting that any downstream use of SlimPajama-627B for commercial AI tools amounts to ongoing infringement. That framing is crucial, because it shifts the focus from whether Adobe directly scraped specific websites to whether it exercised enough diligence when adopting a third‑party dataset that may have contained unauthorized copies of Lyon’s books and other authors’ works.
Adobe’s earlier Terms of Use backlash and trust problem
The lawsuit does not arise in a vacuum. Earlier this year, Adobe faced a wave of anger from creative professionals over changes to its Terms of Use that many users interpreted as a grab for broad rights over their content. Artists and designers worried that the company could scan their files stored in the cloud and repurpose them for AI training, a fear that spread quickly across social networks and professional forums. In one widely shared video, a creator named Mike at Game from Scratch walked viewers through the updated language and argued that it put Adobe in damage control mode, highlighting how the company’s communication missteps fueled suspicion among paying customers, a critique captured in a breakdown of how the ongoing Adobe saga continues to erode goodwill.
Under pressure, Adobe issued a public Response and announced Changes to its Terms of Use, insisting that it did not claim ownership of user content and that any analysis of files was aimed at security and feature improvements rather than wholesale AI training. The company emphasized that it would rely on anonymized data to enhance features and that it respected the livelihoods of creative professionals who depend on its tools. A detailed post on the controversy describes how Adobe’s Response and Changes to Terms of Use were framed as an attempt to calm an internet already on edge about AI, noting that the company tried to reassure users that it would only use anonymized data to enhance features, a message summarized in a discussion that opens with the phrase, Adobe’s Response & Changes to Terms of Use.
From OpenAI and Anthropic to Adobe: a broader AI copyright reckoning
Adobe’s legal troubles arrive amid a broader wave of copyright litigation targeting AI developers, including high‑profile cases against OpenAI and Anthropic. Rights holders across publishing, music, and visual arts have argued that training large models on their work without permission or payment undermines existing licensing markets and devalues creative labor. One social media post that captured the mood put it bluntly, stating that AI is not off the legal hook anymore for copyright violations and urging readers to Check how Anthropic, the maker of Claude AI, is planning to spend heavily on legal defenses, a sentiment encapsulated in a post that begins, AI isn’t off the legal hook.
In that context, the case against Adobe is less an outlier than the latest front in a sprawling battle over how copyright law applies to machine learning. Plaintiffs in multiple suits have argued that ingesting entire books, code repositories, or image catalogs to train AI models is qualitatively different from traditional fair use, because the models can reproduce stylistic signatures or even close paraphrases of the underlying works. Adobe’s position, that it relied on an open‑source dataset curated by Cerebras and that its use of that dataset was lawful, will likely be tested against the same emerging legal theories that have been deployed against OpenAI and Anthropic. The outcome could either reinforce a growing consensus that large‑scale training on copyrighted material requires new licensing frameworks or carve out a more permissive path for companies that rely on third‑party datasets.
Key allegations in the class action complaint
The class action complaint lays out a series of specific allegations that go beyond generalized concerns about AI and copyright. It asserts that Adobe is a computer software company that offers a wide range of programs and that it willfully incorporated infringing material into its AI training pipeline despite knowing that many of the books in SlimPajama-627B were unauthorized copies. The filing characterizes this conduct as Alleged copyright infringement on a massive scale, arguing that Adobe’s role as a sophisticated technology company makes it unreasonable for it to claim ignorance about the provenance of its training data. A detailed legal summary of the complaint notes that Alleged infringement is at the core of the case and that Adobe is accused of building AI features on top of that infringing foundation, a point laid out in coverage that describes how Alleged infringement. Adobe is now facing class‑wide claims.
Another account of the lawsuit emphasizes that Adobe is under fire for a proposed class‑action suit that claims the company used pirated books to train an itty‑bitty AI model, highlighting how the plaintiffs seek to represent a broad group of authors whose works were allegedly used without consent. That same report notes that the case has already prompted discussions about AI Data Governance and Compliance, as corporate customers and regulators alike scrutinize how training datasets are assembled and audited. By framing the dispute as a failure of governance rather than a one‑off mistake, the plaintiffs aim to convince the court that Adobe’s practices warrant not only damages but also structural changes to how it develops and deploys AI tools, a framing captured in coverage that explains why Adobe is under fire over AI training data.
Adobe’s public defense and the role of Cerebras
Adobe’s public response to the lawsuit has centered on its claim that SlimLM was trained on SlimPajama-627B, an open‑source dataset released by Cerebras, rather than on a secret stash of pirated e‑books. By pointing to Cerebras as the originator of the dataset, Adobe is effectively arguing that it relied on a community resource that was widely used in the AI research world and that it had no reason to believe the corpus was riddled with infringing material. The company has described SlimLM as a model built on that dataset and has suggested that any issues with specific texts inside SlimPajama-627B should not automatically translate into liability for downstream users. Reporting on the case notes that Adobe has repeatedly invoked SlimPajama-627B and its release by Cerebras as evidence that it followed industry norms, a detail highlighted in coverage that explains how Adobe accused of using pirated books has tried to shift focus to the dataset’s open‑source pedigree.
However, plaintiffs counter that relying on an open‑source dataset does not absolve Adobe of its duty to ensure that the material inside that dataset is lawfully included. They argue that SlimPajama-627B itself may contain large numbers of pirated books and that Adobe, as a major commercial actor, should have conducted more rigorous due diligence before using it to train a revenue‑generating model. A separate analysis of the controversy underscores that Adobe has explicitly said the model was trained using SlimPajama-627B, an open‑source dataset released by Cerebras in mid‑2023, while also acknowledging that such datasets can wander into legally questionable territory when they aggregate content from across the web. That tension between open research practices and commercial responsibility is at the core of a report that notes how Adobe says the model was trained and why that admission may become a focal point in court.
What is at stake for Adobe, authors, and AI governance
The financial stakes of the lawsuit are significant, with the complaint seeking unspecified monetary damages on behalf of the proposed class, as well as potential injunctive relief that could limit how Adobe deploys SlimLM and related tools. One detailed report on the case notes that Adobe Faces Copyright Infringement Lawsuit for Training AI Tools and that the company is confronting claims that could reshape its AI roadmap if the court finds that its training practices violated copyright law. That same account, published by Maaal, emphasizes that the case titled Adobe Faces Copyright Infringement Lawsuit for Training AI Tools is seeking unspecified monetary damages, underscoring how the plaintiffs are leaving room for a substantial award if they can prove widespread infringement, a point captured in coverage that repeats the phrase Adobe Faces Copyright Infringement Lawsuit for Training AI Tools.
Beyond the immediate financial exposure, the case could set important precedents for AI data governance and compliance across the tech industry. If a court concludes that using a dataset like SlimPajama-627B exposes companies to liability for every infringing work inside it, AI developers may be forced to abandon broad web‑scale corpora in favor of tightly licensed collections, dramatically increasing costs and slowing model development. Conversely, a ruling that treats such training as fair use or otherwise permissible could embolden companies to lean more heavily on open‑source datasets, even when their provenance is murky. For authors, the outcome will signal whether courts are willing to recognize the inclusion of their books in training data as a compensable harm or whether they must look to legislatures and collective licensing schemes for relief.
How this case fits into Adobe’s AI strategy and the road ahead
The lawsuit also intersects with Adobe’s broader strategy to embed AI into its flagship products, from Photoshop and Illustrator to Acrobat and its document cloud services. The company has marketed its generative features as a way to help users draft text, summarize documents, and create visual content more efficiently, positioning itself as a partner to creative professionals rather than a replacement. However, the allegations that it misused authors’ work in AI training cut directly against that narrative, suggesting that the same writers and artists who rely on Adobe’s tools may have had their livelihoods undercut by the company’s own models. One report on the case notes that Adobe hit with proposed class‑action, accused of misusing authors’ work in AI training, and that in October the company had already been under scrutiny for how its AI algorithms are trained on user and third‑party content, a timeline described in coverage that points out that Adobe hit with proposed class-action after earlier questions about its AI practices In October.
As the case moves forward, I expect Adobe to double down on its messaging around consent, control, and transparency, both to reassure existing customers and to persuade courts that it has acted responsibly. The company will likely highlight opt‑out mechanisms, content credentials, and other safeguards it has introduced in response to earlier backlash, while continuing to argue that its reliance on SlimPajama-627B and Cerebras was consistent with industry norms. At the same time, authors and their advocates will push for stronger legal recognition that training data is not a free resource, but a collection of works created by identifiable individuals who expect to be asked and paid before their books are fed into an AI system. A separate report that focuses on how Adobe faces class action lawsuit after allegedly misusing authors’ work in AI training underscores that the plaintiffs see this as a fight over consent, credit, and compensation, a framing captured in coverage that notes how Adobe accused of using pirated books did so without consent, credit, or compensation.
More from MorningOverview