from Hacker News

A federal judge sides with Anthropic in lawsuit over training AI on books

by moose44 on 6/24/25, 4:22 PM with 204 comments

  • by NobodyNada on 6/24/25, 5:29 PM

    One aspect of this ruling [1] that I find concerning: on pages 7 and 11-12, it concedes that the LLM does substantially "memorize" copyrighted works, but rules that this doesn't violate the author's copyright because Anthropic has server-side filtering to avoid reproducing memorized text. (Alsup compares this to Google Books, which has server-side searchable full-text copies of copyrighted books, but only allows users to access snippets in a non-infringing manner.)

    Does this imply that distributing open-weights models such as Llama is copyright infringement, since users can trivially run the model without output filtering to extract the memorized text?

    [1]: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

  • by 3PS on 6/24/25, 4:32 PM

    Broadly summarizing.

    This is OK and fair use: Training LLMs on copyrighted work, since it's transformative.

    This is not OK and not fair use: pirating data, or creating a big repository of pirated data that isn't necessarily for AI training.

    Overall seems like a pretty reasonable ruling?

  • by sillysaurusx on 6/25/25, 12:54 PM

    The reason I made books3 was to help force a decision on this issue. I’m happy to see that it’s settled, and that it’s legal for robots to read books.

    It’s also proof that an individual scientist can still change the world, in some small way. Believe in yourself and just focus on your work, even if the work is controversial.

    (I’m late to the thread, so ~nobody will see this. But it’s the culmination of about five years of work for me, so I wanted to post a small celebratory comment anyway. Thank you to everyone who was supportive, and who kept an open mind. Lots of people chose to throw verbal harassment my way, even offline, but the HN community has always been nice.)

  • by Fluorescence on 6/24/25, 8:51 PM

    I'm surprised we never discuss a previous case of how governments handled a valuable new technology that challenged creative's ability to monetise their work:

    Cassette Tapes and Private Copying Levy.

    https://en.wikipedia.org/wiki/Private_copying_levy

    Governments didn't ban tapes but taxed them and fed the proceeds back into the royalty system. An equivalent for books might be an LLM tax funding a negative tax rate for sold books e.g. earn $5 and the gov tops it up. Can't imagine how to ensure it was fair though.

    Alternatively, might be an interesting math problem to calculate royalties for the training data used in each user request!

  • by bradley13 on 6/24/25, 7:00 PM

    Good. Reading books is legal. If I own a book and feed it to a program I wrote (and I have done exactly that), it is also legal. There is zero reason this should be any different with an AI.
  • by paxys on 6/24/25, 6:22 PM

    Will be interesting to see how this affects Anthropic's ongoing lawsuit with Reddit, or all the different media publishing ones flying around. Is it okay to train on books but not online posts and articles? Why the distinction between the two?
  • by gbacon on 6/24/25, 4:35 PM

    The HN crowd dislikes brick-and-mortar landlords but often sides with charging rent for certain bits. Which side will prevail?

    Interesting excerpt:

    > “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,” Judge Alsup wrote in the decision. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages.”

    Language of “pirated” and “theft” are from the article. If they did realize a mistake and purchased copies after the fact, why should that be insufficient?

  • by blindriver on 6/24/25, 9:21 PM

    Humans read books. AI/LLMs do not read. I think there's an inherent difference here. If the LLM is making a copy of the entire book in it's memory, is that copyright infringement? I don't know the answer to that, but it feels like Alsup is considering this fair use argument in the context of a human, but it's nothing like a human and needs to be treated differently.
  • by bgwalter on 6/24/25, 5:05 PM

    I have the feeling that with Alsup always the larger and more recent company wins. Google won vs. Oracle, now this.

    So what is he going to do about the initial copyright infringement? Will the perpetrators get the Aaron Schwartz treatment?

  • by UltraSane on 6/24/25, 7:29 PM

    If the US makes it illegal to train LLMs on copyrighted data that isn't going to stop China from doing it and give them an ENORMOUS advantage.
  • by nektro on 6/24/25, 11:08 PM

    devastating news
  • by josefritzishere on 6/24/25, 5:33 PM

    The US legal systel is bending over backwards to help AI development. The arguments border on nonsense.
  • by kmeisthax on 6/24/25, 7:39 PM

    > “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,” Judge Alsup wrote in the decision. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages.”

    I'm not sure why this alone is considered a separate issue from training the AI with books. Buying a copy of a copyrighted work doesn't inherently convey 'fair use rights' to the purchaser. If I buy a work, read it, sell it, and then publish a review or parody of it, I don't infringe copyright. Why does mere possession of an unauthorized copy create a separate triable matter before the court?

    Keep in mind, you can legally engineer EULAs in such a way that merely purchasing the work surrenders all of your fair use rights. So this could wind up being effectively: "AI training is fair use for works purchased before June 24th, 2025, everything after is forbidden, here's your brand new moat OpenAI"

  • by deepsun on 6/24/25, 5:42 PM

    Ok, so I can create a website, say, the-ai-pirate-bay.com, where I stream AI-reproduced movies. They are not verbatim, so I don't infringe any copyrights.