by eddyzh on 12/30/23, 10:17 AM with 755 comments
by ctoth on 12/30/23, 5:01 PM
Who truly owns the tales of Snow White and Cinderella?
These stories didn't originate with Disney; they are part of a rich tapestry of folklore passed down through generations. Disney's success was partly built on adapting these existing narratives, which were once shared and reshaped by communities over centuries.
This conversation shouldn't just be about the technicalities of AI or the legalities of copyright; it should be about understanding the deep roots of our shared culture.
At its core, culture is a communal property, evolving and growing through collective storytelling and reinterpretation.
The current debate around AI and copyright infringement seems to overlook this fundamental aspect of cultural evolution. The algorithms might be new, but the practice of reimagining and repurposing stories is as old as humanity itself.
By focusing solely on the legal implications and ignoring the historical context of cultural storytelling, we risk overlooking the essence of what it means to be a creative society.
As a large human model, (no really I could probably lose some weight) I think it's just silly how we're all sort of glossing over the fact that Disney built their house of mouse on existing culture, on existing stories, and now the idea that we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.
by Havoc on 12/30/23, 12:08 PM
Everyone knew it was trained on copyrighted material and capable of eerily similar outputs.
But it’s already done. At scale. Large corps committing fully. There is no chance of that toothpaste going back in the tube.
It’s a bit like when big tech built on aggressive user data harvesting. Whether it’s right, ethical or even legal is academic at this stage. They just did it - effectively without any real informed consent by society. Same thing here - 9 out of 10 people on street won’t be able to tell you how AI is made let alone comment on copyright.
So the right question here is what now. And I suspect much like tracking the answer will be - not much.
by niemandhier on 12/30/23, 2:12 PM
Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use works that are lawfully accessible and where the rightholders have not explicitly reserved use for text and data mining purposes.
AFAIK they are discussing something like a robot.txt to flag stuff as „not for training“. You will probably be expected to implement some safeguards and of course the end user will have to be careful in his use of the generated things.
Source at Kluwers: https://copyrightblog.kluweriplaw.com/2023/02/20/protecting-...
EU Legal Text: https://eur-lex.europa.eu/eli/dir/2019/790/oj
by koliber on 12/30/23, 2:26 PM
Why does anyone assume that ChatGPT or other tools would NOT produce previously-copyrighted content?
I can see a naive assumption that since it is “generated” it’s original. However that assumption falls apart as soon as you replace “ChatGPT” with “junior artist”. Tell them to draw a droid from a sci-fi movie, don’t mention anything else. Don’t say anything about copyrights. Don’t tell them that they have to be original. What would you expect them to produce?
by appplication on 12/31/23, 3:55 AM
It’s not derivative work. We’re way past that. NYT has an exceptionally strong case here and anyone arguing about the merits of copyright is way off the mark. This court case is not going single-handedly to undo copyright. OpenAI has very little going for them other than “this is new, how were we to know it could do this”. So knowing that, the currently trained models are in a very sticky situation.
Further, I don’t see NYT settling. The implications are too large, and if they settle with OpenAI, they will have a similar case pop up with every other model. And every other publisher of digital content with have a similarly merited case. This is an inflection point for generative AI, and it’s looking like it will be either much more expensive or much more limited than we originally thought.
A side effect of this: I am predicting that we will start to see a rise in “pirate” models. Models who eschew all legality, who are trained in a distributed fashion, and whose weights are published not by corporations but by collectives (e.g. torrent models). There is a good chance we see these surpass the official “well behaved” models in effectiveness. It will be an interesting next few years to see this play out.
by marckrn on 12/30/23, 12:22 PM
by keiferski on 12/30/23, 12:01 PM
Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?
1. The describe command can describe an image in Midjourney. I imagine other AI tools have similar features: https://docs.midjourney.com/docs/describe
by WhiteNoiz3 on 12/30/23, 2:21 PM
Personally, I think generative AI should be able to provide links to similar source material in the training data.. This would be the barest way to compensate those who have contributed to training the AI. I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. Plus I think having sources adds a layer of transparency and aids users in understanding when content is hallucinated vs. not. People should be able to opt out of having their content used for training and be able to confirm that it has been removed for future iterations. Let's be honest that AI companies are just trying to avoid lawsuits by keeping it secret. These are areas where I think regulation can help rather than worrying about doomsday scenarios.
by preommr on 12/30/23, 12:01 PM
I do think it's somewhat trademark infringement by these models, also that it should be allowed and that ultimate responsibility should be on the person using the images in a final work meant for consumption by the general public as stand alone media.
by FridgeSeal on 12/30/23, 2:45 PM
They’re giving people plausible deniability in the “chain of responsibility”, and I think if we took away “LLM” and replaced it with “fairground sideshow magic box” the argument that LLM’s are somehow special and deserving of exemptions disappears real quick.
by dang on 12/30/23, 6:43 PM
NY times is asking that all LLMs trained on Times data be destroyed - https://news.ycombinator.com/item?id=38816944 - Dec 2023 (93 comments)
Also:
NY Times copyright suit wants OpenAI to delete all GPT instances - https://news.ycombinator.com/item?id=38790255 - Dec 2023 (870 comments)
NYT sues OpenAI, Microsoft over 'millions of articles' used to train ChatGPT - https://news.ycombinator.com/item?id=38784194 - Dec 2023 (84 comments)
The New York Times is suing OpenAI and Microsoft for copyright infringement - https://news.ycombinator.com/item?id=38781941 - Dec 2023 (861 comments)
The Times Sues OpenAI and Microsoft Over A.I.’s Use of Copyrighted Work - https://news.ycombinator.com/item?id=38781863 - Dec 2023 (11 comments)
by kranke155 on 12/30/23, 1:48 PM
You get steamrolled for defending yourself while you overhear above applause to those who have robbed you of your future.
by aimor on 12/30/23, 4:59 PM
What I get from this is that Llama2 70B contains 93% of Harry Potter Chapter 1 within it. It's not 100% (which would mean no need to share the encoded indices) but it's still pretty significant. I want to repeat this with the entire text of some books, the example I picked isn't representative because the text is available online on the official website.
by beginning_end on 12/30/23, 11:34 AM
"Congress should declare that big-data AI models do not infringe copyright, but are inherently in the public domain.
Congress should declare that use of AI tools will be an aggravating rather than mitigating factor in determinations of civil and criminal liability."
by wslh on 12/30/23, 5:57 PM
I know we are talking about different technologies but it seems all these people were very silent and find some opportunity in having this war with OpenAI (not an endorsement) but not fighting others.
I am not making an statement about the morals of AI and aggregators/search engines (super interesting discussion that in a way was happening for long) but I am surprised that organizations are "just" waking up. It seems they just see it is a much simple and cheap fight.
by clbrmbr on 12/30/23, 12:58 PM
by rmholt on 12/30/23, 1:00 PM
Private models will not care, nor will things change for IP owners with lesser power.
by CTmystery on 12/30/23, 11:41 AM
Is it necessary to fix in the model itself? It seems a gate in the post processing pipeline that checks for copyright infringement could work, provided they can create another model that identifies copyrighted work (solving the problems of AI with more AI :/)
by AlienRobot on 12/30/23, 12:40 PM
Yeah, downloading the content of a webpage may be legal, but redistributing it isn't.
I wish people stopped trying to make these things seem more important than they really are just because IT people call them "technologies". Blockchain isn't a technology. HTML isn't a technology. React isn't a technology. And AI is now not a technology.
When I see ChatGPT or OpenAI, I don't think of "technology". I think of a program. Software. Because that's what it is. You don't say "none of the laws that exist in this world apply to this" every time you release new software.
I bet many people can't tell the difference between a quick answer from Google and a text generated by ChatGPT on Bing. They just see the output.
All that amazing capability of generative AI? That got old fast. It was groundbreaking for one instant. Now it's just an app that generates images. Just another piece of software. Nothing special about it.
Torrenting and other p2p file transfer protocols didn't get a pass for inventing groundbreaking ways to break the law. I don't think OpenAI will get a pass for doing the same.
by davidy123 on 12/30/23, 12:23 PM
I think what NYT &c want is for large companies like Apple to pay them for access to their works. This to me is the wrong path, just leading to more silos and walled gardens, special access for the elite.
An alternative is base models trained on Wikipedia and public domain (science journals, etc). Foundations could support high quality, well rounded current events reporting. Wikimedia provides a good model for this, with referenced summaries that I don't think can be said to reasonably violate copyright. The models would need to be improved to support references, or RAG attribution would have to be widely used when bringing in works that have a current copyright.
by dawnim on 12/30/23, 11:49 AM
by pointlessone on 12/30/23, 12:21 PM
by Aerroon on 12/30/23, 11:56 AM
Ask someone about two Italian brothers in a video game with a red and green hat that have M and L on them. What do you think you would get?
If I describe "imagine a comic book duck that swims in a sea of gold in his vault" you would immediately think of Scrooge McDuck, no?
by mensetmanusman on 12/30/23, 1:27 PM
China can't produce LLMs because of inconvenient truths.
The US can't produce LLMs because of copyright.
Decentralized open source LLMs might exist that could work, but they won't have the giant GPU clusters.
A rich country with lax rule of law wins? Maybe that's why Sam went to the Saudis?
by jpeter on 12/30/23, 11:44 AM
by bambax on 12/30/23, 12:02 PM
by redcobra762 on 12/30/23, 1:56 PM
Not sure how this “gets worse” or better for anyone. The current state of things seems generally fine, and there’s a real possibility the courts see it that way too.
by continuational on 12/30/23, 11:38 AM
Me: Who owns the rights to this bot?
Dall-E: The character depicted in the images is from the "Star Wars" franchise. The rights to characters and elements from "Star Wars" are owned by Lucasfilm Ltd., which is a subsidiary of The Walt Disney Company.
Perhaps it is able to tell, if you ask it?
by ponorin on 12/30/23, 2:21 PM
by smrtinsert on 12/30/23, 5:42 PM
by DigitallyFidget on 12/30/23, 4:41 PM
I'm not sure how it'll hold up in law to claim copyright violations against something that wasn't created by a person. It'll really depend on the lawyers and judge's interpretation of written law. But I'm curious to see what comes of this.
by 1shooner on 12/30/23, 4:35 PM
by Hugsun on 12/30/23, 12:02 PM
One issue with that is that there is not a reliable way to determine if copyright is being infringed.
Even if models could be used responsibly, there might not be a reasonable expectation that most people will. If infringement is so easy and avoiding it relatively hard.
I'm not sure what legal prescriptions should be made on this basis, but it's an interesting thought.
by golol on 12/30/23, 2:46 PM
by shkkmo on 12/30/23, 7:43 PM
Instead, these are derivative works. We already have a flourishing culter of derivitave works, such as fan art that exist in various shades of legal greyness.
Some derivative works are fair use, some are not.
The position of the Author here seems to be that generative AI should not be capable of creating any derivitave works, or should only be able to do so it it can accurately identify which are fair use and which aren't (which seems like an impossibly tall bar.) This stance seem like a giant attack on fair use that significantly expands the power of copyright.
To me, the takeaway from this is different. This makes clear that there is currently a risk when using AI generated art that you could end up unintentionally creating and publishing a derivative work unintentionally and thus without evaluating if that work constitues fair use.
by qgin on 12/30/23, 5:37 PM
They are about to be infinitely better for generative AI in China.
by karmakaze on 12/30/23, 7:17 PM
Imagine instead of AI/ML, we have a mechanical-turk-like service that produces output from descriptions. The service makes no claims that the generated outputs are not similar to any copyrighted works. The only claim the service makes is that they themselves claim no copyright on the output. It's then up to the user of the service to determine if the output is suitable for their intended use.
Whether such a service itself is legal is a separate matter. For that matter, say you outsourced the artwork to a person who again gave you infringing work. The user of that output is still in violation. With AI/ML we're basically outsourcing to a 'service' that is known to sometimes output copyrighted work so with the user knowing that, are responsible for fair usage.
by docdeek on 12/30/23, 12:11 PM
Is it because Google will link to the image source? Or does the infringement begin when I use the image for gain, or claim it as my own? Perhaps it is because Google was allowed to crawl the page with the original image, so presenting them with a link is fine?
by legendofbrando on 12/30/23, 4:11 PM
Expensive to do but hardly the end of Generative AI or OpenAI should that be the difference between having a business or being sued out of existence. Never underestimate people who have a clear economic interest especially when their own existence is at stake.
by sjducb on 12/31/23, 1:06 PM
I think that an AI model is analogous to an employee. Imagine I ask my employee to write an article, and they just copy an existing one from the times. That’s plagiarism and bad work, not copyright infringement.
If I then decide to publish the plagiarised article, then I have committed copyright infringement.
I once ran into this exact problem with a human. I hired a designer to make some artwork for an app. When I launched the app it turned out that the human had just copied the artwork from another game. It’s my problem that I hired an idiot, and my problem that my app was infringing the copyright of another app. (We redesigned the graphics very quickly)
by jlnthws on 12/30/23, 7:31 PM
by null_point on 12/31/23, 2:56 PM
There is already troves of data that are fair game for training, but even "corrupted" data sets can probably be used if used intelligently. We've already seen examples of new models effectively being trained off of GPT-4. That approach with filters for copyrighted material might allow for data that is sufficiently "scrambled". Not to say building such a filter is definitely easy, but seems plausible.
by KETpXDDzR on 12/30/23, 11:40 PM
In Germany you pay some amount extra on top of the sales price of anything that can store data (CX, DVD, USB sticks, HDDs, ...). This is then distributed to all companies that could be impacted by software piracy. I'm still not sure if that's legal considering the Geneva convention disallows collective punishment.
by airesearcher on 12/30/23, 12:06 PM
Another change could be to the license agreement of LLMs - they could have the user assume liability for any material produced instead of the provider assuming liability. The user would agree that getting the rights for any copies and distribution of copyrighted materials is their sole responsibility instead of the provider.
by 8note on 12/31/23, 7:17 AM
How could you put that as the prompt without intending to infringe? Anything pulled from a classic sci-fi movie would be infringement. The term droid is also star wars specific?
Id consider the "red soda" one as grounds that the Coca-Cola brand has become generic and that it's synonymous with soda. Same thing with Mario too. There is so much non-nintendo content made featuring Mario the plumber that you could get that without training directly on Nintendo's artwork
by wouldbecouldbe on 12/30/23, 12:12 PM
by asylteltine on 12/30/23, 2:21 PM
by josh-sematic on 12/30/23, 1:44 PM
by ur-whale on 12/30/23, 3:22 PM
It is in fact the very notion of Copyright is breathing its last breath, and it is fantastic to be alive to see it happen.
by dmbche on 12/30/23, 5:50 PM
The output is irrelevant.
Edit1: If you want to verify this, check out all the lawsuits against AI companies : it's always about using their copywritten goods. Any discussion about the output is to talk about the amount of damage done to the copyright holder, not if damage exists or not.
by roenxi on 12/30/23, 2:23 PM
At the moment, we don't have hardware that can do what humans do (process video feed from eyeballs and build up a world model). I imagine that we'll cross that barrier cheaply in the coming decades, at which point copyright becomes moot. AIs will be able to develop their own styles and world understanding from scratch, then generate original work.
by Paradigma11 on 12/30/23, 1:35 PM
Content creators/artists compete globally. The only thing harsh regulations will do is create an unlevel playing field where artists from noncaring countries will have big advantages over artists from the west, which will be driven into illegality to compete.
In the end products will have to be classified anyway if they are infringing on copyright and/or were being built by an LLM. Most likely automated by another LLM.
by nojs on 12/30/23, 12:15 PM
It seems like there’s little incentive not to do this, because unlike Google OpenAI isn’t bringing any traffic or eyeballs. It may end up being a default setting in Wordpress for example.
But OpenAI presumably can’t afford to pay every single long tail source of content on the whole internet — so how does this end?
by zarzavat on 12/30/23, 11:52 AM
by digitcatphd on 12/30/23, 2:45 PM
by hahajk on 12/30/23, 2:56 PM
If you flood the market and dominate children's culture with toys from your TV shows, you absolutely cannot complain when your toys are considered iconic enough to be the generic "animated toy". These images don't replace or substitute the things they are depicting.
by karmakaze on 12/30/23, 7:28 PM
by SubiculumCode on 12/30/23, 11:52 AM
by efields on 12/30/23, 1:39 PM
Enterprises that make content with this also don’t want to infringe on copyright. The AI companies don’t have a good story here. The value has not become evident after years.
by tim333 on 12/30/23, 7:56 PM
It's the same for human writers. If you are writing an article for Wikipedia say, you should read relevant source articles and then rewrite in a way that isn't a copy and paste beyond a few words.
by _giorgio_ on 12/30/23, 4:45 PM
Everything that he sees has mysterious flaws that never happen.
by intrasight on 12/30/23, 12:56 PM
by caeril on 12/30/23, 2:53 PM
Can we all have a moment of silence for poor Bob Iger? Maybe we can start a GoFundMe to help him out?
by rolisz on 12/30/23, 12:33 PM
by t_mann on 12/30/23, 12:05 PM
by logicchains on 12/30/23, 11:44 AM
by vimax on 12/30/23, 11:45 AM
by Alifatisk on 12/30/23, 11:59 AM
by goertzen on 12/30/23, 6:26 PM
This is a negotiation tactic by the NYT to drive up the licensing price. Period.
The Napster/Music Industry analogy has no resemblance to this situation.
The only meaningful question that might be answered as a result of this is, what permission and access rights do crawlers have to content that is publicly and legally available.
by quonn on 12/30/23, 12:18 PM
by airstrike on 12/30/23, 5:44 PM
by ultrablack on 12/30/23, 3:33 PM
by amai on 12/30/23, 6:45 PM
by Avicebron on 12/30/23, 12:50 PM
by renewiltord on 12/30/23, 11:52 AM
I'll just do it myself.
by amelius on 12/30/23, 12:09 PM
by smitty1e on 12/30/23, 12:01 PM
That's gonna leave a Marx[1].
by ofslidingfeet on 12/30/23, 11:36 PM
by penjelly on 12/30/23, 11:46 AM
also my concern, except it feels like many of LLMs "problems" cant be easily fixed
by zanfr on 12/30/23, 4:41 PM
by Log_out_ on 12/30/23, 4:00 PM
by AC_8675309 on 12/30/23, 3:16 PM
by wayeq on 12/30/23, 7:24 PM
by SKILNER on 12/30/23, 6:29 PM
by throwuwu on 12/30/23, 5:16 PM
by RecycledEle on 12/30/23, 7:34 PM
Recall that according to the US Constitution, copyright can only be on on "science and the useful arts."
Alternately, we could restore a reasonable limit to the duration of copyrights, like 14 years.
by pxoe on 12/30/23, 3:11 PM
"but what if we want to scrape the entire web and something makes it in anyway? see, that is impossible". well that's just saying "fuck it" and using bad data anyway. that's not an actual effort to "not use data you can't use" - there was just no way there'd be a 'rights cleared' way to use the entire web anyway. that is impossible. using a clean dataset is not impossible. it's very possible.
by RandomGerm4n on 12/30/23, 11:57 AM
Apart from this, it is mainly large companies that benefit from copyright laws. Why should we have laws that restrict progress just so large capitalist companies can maximize their profits?
by skybrian on 12/30/23, 1:26 PM
by oglop on 12/30/23, 3:13 PM
I wasn’t shocked when I noticed I could query it about ANY math textbook I owned and it could talk with me about it. I did t bitch and gripe, I enjoyed it and have conversations.
Anyway, I’m in the minority I guess. I love that I can talk with it about books and news.
by freddealmeida on 12/30/23, 11:59 AM
by Joel_Mckay on 12/30/23, 12:24 PM
The paradox should still violate Trademarks due to similarity, but likely cannot infringe on copyright content under prior legal opinion... if at least 80% different from prior art. The lawyers are likely going to have to do a special firm survey to figure this one out.
Bag of popcorn ready =)
by yieldcrv on 12/30/23, 5:21 PM
the models can be fine
by gfodor on 12/30/23, 5:58 PM
by octacat on 12/30/23, 5:14 PM
by Intox on 12/30/23, 11:44 AM
I don't see any developped country pressing the brake on AGI in the near future to protect a few copyright holders from getting "stolen" in hypothetic scenarios.
by Baldbvrhunter on 12/30/23, 10:34 AM
I hire a session musician to play on my new single, paying him $100. I record the whole session.
I ask him to play the opening to "Stairway to Heaven" and he does so.
"Well, I can't use that as a sample without paying"
"Ok play something like Jimmy Page"
"Hmm, still sounds like Stairway to Heaven"
"Ok, try and sound less like Stairway to Heaven but in that style"
"Great, I'll use that one"
and I release my song and get $5,000 in royalties.
Should I be sued for infringement, or the guitarist?
The problem, I suppose, is that if I had said "play something like 70s prog rock" and he played "Stairway to Heaven" and I didn't know what it was and said "great, I'll use that".
Should I be sued for infringement, or the guitarist?
by iainctduncan on 12/30/23, 5:27 PM
Remember when everyone and their dog discovered sampling in the late 80's and they all thought they could get away with it because it didn't seem like infringement to the samplers? The courts had no qualms about slapping record labels for putting out records with unlicensed samples in them. Albums even got pulled off shelves while licenses were sorted out.
These companies are charging for a service that returns copyrighted content, full stop. You can't do that whether you are AI or someone drawing Mario and selling the pictures on iStock, or putting out records that sample someone else's work without permission. It took a while in the case of sampling, but it sure as hell happened.
by sjfjsjdjwvwvc on 12/30/23, 12:05 PM
IMO would be best if this stays a highly illegal technology that is only available to a few weirdo nerds /s
by jdjdjdkdksmdnd on 12/30/23, 12:09 PM
by whodidntante on 12/30/23, 1:43 PM