Generative vs. Creative: A court verdict on AI training has exposed an Anthropic-shaped chink in US copyright law

Anthropic was sued by a group of three authors whose books were in the training data. (AFP)
Anthropic was sued by a group of three authors whose books were in the training data. (AFP)
Summary

The recent ruling that okayed Anthropic’s use of ‘stolen books’ to train its AI model shows how copyright law loopholes can be exploited. If laws aren’t modified, the creative industry could face extinction.

In what is shaping up to be a long, hard fight over the use of creative works, round one has gone to the AI makers. In the first such US decision of its kind, District Judge William Alsup said Anthropic’s use of millions of books to train its artificial-intelligence model, without payment to the sources, was legal under copyright law because it was “transformative—spectacularly so."

The closely watched ruling is a warning of what lies ahead under existing copyright laws. Designed to protect creative freedom, the ‘fair use’ doctrine that Anthropic used to successfully defend its actions is now the most potent tool for undermining the creative industry’s ability to support itself in the coming age of AI. 

If a precedent has been set, as several observers believe, it stands to cripple one of the few possible AI monetization strategies for rights holders, which is to sell licenses to firms for access to their work. Some of these deals have already been made while the ‘fair use’ question has been in limbo, deals that emerged only after the threat of legal action. This ruling may have just taken future deals off the table. 

Also Read: Pay thy muse: Yes, AI does owe royalties for stolen inspiration

For context, it’s useful to understand how Anthropic built the large language model that underpins its popular AI chat bot, Claude. First, according to court filings, it downloaded pirated copies of at least 7 million books to avoid the “slog" (its chief executive officer wrote) of acquiring them through more legitimate means. Later, thinking better of the outright theft, the company decided to buy millions of used physical books (usually one copy per title), telling distributors it wanted to create a “research library." Anthropic staff then removed the spines, scanned the pages into a digital format and destroyed the originals. 

This library was used to train Anthropic’s LLM, giving Claude the kind of smarts it can charge money for. The chatbot offers limited use for free but a fuller experience for $20 a month, and more for businesses. As of its last funding round, Anthropic was valued at $61.5 billion. (As a guide, publisher Simon and Schuster was sold in 2023 for $1.62 billion.)

Anthropic was sued by a group of three authors whose books were in the training data. In the judge’s ruling, he said that Anthropic’s acquisition of pirated material was unlawful, damages for which will be assessed at a trial. That was the one piece of bad news for the company. The far bigger news was how the ruling gives the green light to Anthropic—and every other AI firm building LLMs in this way—by declaring everything else it did aboveboard. Millions of books were ingested and repurposed, their knowledge sold on without a penny ever going to the originators. Judge Alsup’s ruling, which follows the law tightly, serves as an important example of its now critical blind spots. 

Also Read: ChatGPT plays Ghibli well: Will genuine originality suffer?

The first part of the ‘fair use’ test was pretty easy to pass: The material that comes out of Claude is significantly different from what goes in. “Sensationally," different, Judge Alsup wrote, deeming it to clear the test’s bar. That is undoubtedly true because the law (quite reasonably) deals only with the precise output while ignoring the fundamental knowledge or idea that underpins it.

A trickier test is whether the existence of Claude diminishes the authors’ ability to sell their books. In this, Alsup again stressed that because what comes out of Claude isn’t an exact replica, or a substantial knock-off, then the market for buying the books is left fully intact. This misses the point of an AI bot. Turning to one—rather than, say, a library (which pays for its books), or a newspaper (which pays its contributors)—is a shortcut that reduces the need to interact with the source material at all. 

Consider Google’s AI Overviews feature, which synthesizes content from news and other sources into easily digestible answers, saving the need to visit websites directly. It works great: Traffic to websites has plummeted, taking with it the business model that supports their existence. Matthew Prince, CEO of online security group Cloudflare, put it in starker terms. Speaking at an event in Cannes, Prince said that for every web visit Anthropic sends a publisher’s way, it crawls that site for information 60,000 times. “People aren’t following the footnotes," he warned.

Given the nature of how a book is acquired, it’s impossible to have an equivalent stat, but the logic clearly extends: AI reduces the need to go to the source and, therefore, the opportunity for publishers to sell it to people and generate income to support the creation of more of it.

Another argument thrown out by the court was concern that Claude could be used to create competing works—that the AI will be used to generate an alternative to the book because it knows everything in it. On this, Alsup agrees that’s likely, but adds:

Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.

This most clearly exposes the severe limitations of copyright law, where no framework is provided to account for the existence and application of an incredible writing machine that has swallowed up 7 million stolen books. If it does account for it, it does so shortsightedly, considering it—as Alsup writes—to be “no different" from a child because both are being given things to read and taught how to write. An absurdity: One is a human schoolchild, the other is a machine. This changes the conversation immensely.

A child might read 10 books a year if we’re lucky. The creation of the books she reads is supported by a parent or school buying them for her. If she decides to write, it’s one of life’s miracles—a chance for her imagination to flow onto the page. Despite being just a child, or perhaps because of it, her writing will be fresh and unique, laden, between the lines or otherwise, with lived experience. The home she grew up in, the friends she’s met, the dreams she has—all will influence how she interprets the contents of the books she has read, determining how she chooses to pass on that knowledge. Her writing will contain contradictions, flaws and humanity. Most important for this debate, her “competing work" is additive. She will contribute.

Also Read: Rahul Matthan: AI models aren’t copycats but learners just like us

The machine downloads 7 million books and learns nothing—for it cannot learn, at least not in any true sense of the word. It does not contribute; it copies. Sure, it may synthesize information in ways that may surprise us, but it does so only thanks to the hard and uncompensated work of others. It can have no lived, or new, experiences. 

For sure, a competent new knowledge tool may have been created, but AI doesn’t so much generate new value as it does transfer it—from one place, the original source, to another: itself. That’s not in and of itself a problem; many technologies do this. But this value transfer should command a fee to the originator if the copyright law’s stated goal of advancing original works of authorship is to be met for generations to come.

AI is already a phenomenal technology that I use daily. My monthly AI bill across multiple services now exceeds what I pay for any other types of subscriptions. I pay those costs because I understand running an AI platform is expensive, what with all those data centers, power plants, Nvidia chips and engineering talent that must be amassed. Alsup was right when he wrote that “the technology at issue was among the most transformative many of us will see in our lifetimes." 

But that doesn’t mean it shouldn’t pay its way. Nobody would dare suggest Nvidia CEO Jensen Huang hand out his chips for free. No construction worker is asked to keep costs down by building data center walls for nothing. Software engineers aren’t volunteering their time to Meta in awe of Mark Zuckerberg’s business plan—they instead command salaries of $100 million and beyond. 

Yet, as ever, those in the tech industry have decided that creative works, and those who create them, should be considered of little or no value and must step aside in service of the great calling of AI—despite being every bit as vital to the product as any other factor mentioned above. As science-fiction author Harlan Ellison said in his famous sweary rant, nobody ever wants to pay the writer if they can get away with it. When it comes to AI, paying creators of original work isn’t impossible, it’s just inconvenient. Legislators should leave companies no choice. ©Bloomberg

The author is Bloomberg Opinion's US technology columnist.

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
more

topics

Read Next Story footLogo