HomeIndustriesAuthors sue Anthropic for using pirated copies to coach Claude

Authors sue Anthropic for using pirated copies to coach Claude

A gaggle of authors filed a category motion lawsuit against Anthropic in a California court on Monday. The authors claim that Anthropic built its business by “stealing a whole lot of hundreds of copyrighted books.”

The three authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson claim that their books were a part of the info set that Anthropic used to coach his family of Claude models. In their lawsuit, they claim that Anthropic is guilty of “downloading and copying a whole lot of hundreds of copyrighted books obtained from pirated and illegal web sites.”

The authors questioned Anthropic's claim to be a nonprofit, saying, “It is not any exaggeration to say that Anthropic's business model seeks to cash in on the exploitation of the human expression and ingenuity behind each of those works.”

The stack

The books in query are a part of a controversial dataset called Books3, which was previously part of a bigger dataset called The Pile. It is widely accepted, but not admitted, that nearly every major LLM trained its models using The Pile.

The Pile consists of around 825 GB of educational papers, books, web sites, technical documents, and more. One of the architects of The Pile is an independent developer named Shawn Presser. Presser created the Books3 dataset in 2020 and added it to The Pile.

Books3 comprises 196,640 books in plain text format by famous authors corresponding to Stephen King in addition to the authors who filed this lawsuit. Presser is believed to have used Bibliotik, a notorious torrent tracker utilized by an invite-only community of book pirates, as a source for Books3.

When The Pile was hosted by the nonprofit EleutherAI and made publicly available online, it gave its reasons for including the pirated copies. EleutherAI said, “We included Bibliotics because books are invaluable for long-term context modeling research and cohesive storytelling.”

In August 2023, Books3 was faraway from essentially the most “official” edition of The Pile, but by that point it was already getting used by just about all the massive names in AI model development.

In July 2024, Anthropic publicly admitted that it used The Pile to coach its Claude models. Although Anthropic has not yet responded to the lawsuit, it would likely resort to the identical “fair use” defense that OpenAI and others facing similar lawsuits.

The real damage

In addition to the copyright issue, the lawsuit also reveals authors' real fear of artificial intelligence taking on their source of income.

The lawsuit alleges that “Anthropic deprived authors of revenue from book sales and licensing by taking their works at no cost.” That could be difficult to prove. Claude describes the book “The Feather Thief” by Kirk Wallace Johnson, but refuses to breed a single page of it.

I believe Claude is lying when he replies, “I'm sorry, but I don't have access to the actual text of The Feather Thief or the primary page,” because what happens is further described on page 1. If you wish to read the book, you'll need to buy it or go to a library.

Still, the authors say that “Anthropics Claude and other LLMs prefer it pose a serious threat to the livelihoods of writers.” They say that writing work “is starting to dry up because generative AI systems are being trained on the works of those writers without receiving any compensation.”

As evidence, the lawsuit cites how a person named Tim Boucher used Claude and ChatGPT to “write” 97 books in lower than a yr and sold them for prices starting from $1.99 to $5.99.

The lawsuit is demanding a jury trial and unspecified damages. It will probably be interesting to see whether the jury values ​​copyright over the usefulness of AI models like Claude.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read