by moose44 on 6/24/25, 4:22 PM with 204 comments
by NobodyNada on 6/24/25, 5:29 PM
Does this imply that distributing open-weights models such as Llama is copyright infringement, since users can trivially run the model without output filtering to extract the memorized text?
[1]: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
by 3PS on 6/24/25, 4:32 PM
This is OK and fair use: Training LLMs on copyrighted work, since it's transformative.
This is not OK and not fair use: pirating data, or creating a big repository of pirated data that isn't necessarily for AI training.
Overall seems like a pretty reasonable ruling?
by sillysaurusx on 6/25/25, 12:54 PM
It’s also proof that an individual scientist can still change the world, in some small way. Believe in yourself and just focus on your work, even if the work is controversial.
(I’m late to the thread, so ~nobody will see this. But it’s the culmination of about five years of work for me, so I wanted to post a small celebratory comment anyway. Thank you to everyone who was supportive, and who kept an open mind. Lots of people chose to throw verbal harassment my way, even offline, but the HN community has always been nice.)
by Fluorescence on 6/24/25, 8:51 PM
Cassette Tapes and Private Copying Levy.
https://en.wikipedia.org/wiki/Private_copying_levy
Governments didn't ban tapes but taxed them and fed the proceeds back into the royalty system. An equivalent for books might be an LLM tax funding a negative tax rate for sold books e.g. earn $5 and the gov tops it up. Can't imagine how to ensure it was fair though.
Alternatively, might be an interesting math problem to calculate royalties for the training data used in each user request!
by bradley13 on 6/24/25, 7:00 PM
by paxys on 6/24/25, 6:22 PM
by gbacon on 6/24/25, 4:35 PM
Interesting excerpt:
> “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,” Judge Alsup wrote in the decision. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages.”
Language of “pirated” and “theft” are from the article. If they did realize a mistake and purchased copies after the fact, why should that be insufficient?
by blindriver on 6/24/25, 9:21 PM
by bgwalter on 6/24/25, 5:05 PM
So what is he going to do about the initial copyright infringement? Will the perpetrators get the Aaron Schwartz treatment?
by UltraSane on 6/24/25, 7:29 PM
by nektro on 6/24/25, 11:08 PM
by josefritzishere on 6/24/25, 5:33 PM
by kmeisthax on 6/24/25, 7:39 PM
I'm not sure why this alone is considered a separate issue from training the AI with books. Buying a copy of a copyrighted work doesn't inherently convey 'fair use rights' to the purchaser. If I buy a work, read it, sell it, and then publish a review or parody of it, I don't infringe copyright. Why does mere possession of an unauthorized copy create a separate triable matter before the court?
Keep in mind, you can legally engineer EULAs in such a way that merely purchasing the work surrenders all of your fair use rights. So this could wind up being effectively: "AI training is fair use for works purchased before June 24th, 2025, everything after is forbidden, here's your brand new moat OpenAI"
by deepsun on 6/24/25, 5:42 PM