by 8ig8 on 8/16/23, 10:40 PM with 369 comments
by pimterry on 8/17/23, 8:44 AM
It's extremely easy to get the latest generation of AIs to produce outputs that in many fields sans-AI would be trivially considered as IP infringement.
While there are many interesting reasonable legal & technical arguments that it's not, the result completely undermines copyright protections regardless. If that's accepted at scale, copyright in practice will change completely. In effect, the choices are "block this, or entirely destroy copyright protections in many industries". You can't allow this without eventually allowing everybody to simulate their own NY Times reporters, produce their own Marvel movies, and create their own Taylor Swift albums.
If you do allow that, the many many affected industries have catastrophic problems.
Problematic though copyright laws are, I see no world where all those protections go away any time soon, and so if the courts don't agree to protect copyright already in this scenario, then it will eventually be legislated to make that happen. AI consuming copyrighted data and producing an output has to be considered a derivative work (or indeed, the model itself will be considered a derivative work) or IP protections are effectively broken.
There's a grace period now while we work our way there, but the politics is pretty clear and with no plausible path to "let's drop copyright completely" ASAP, I just don't see any other result in the medium term. Doesn't mean the end of generative AI by any means, just a slowdown as we move to a world where you need to negotiate rights and buy data to feed it first, instead of scraping everybody else's for free.
by ojosilva on 8/17/23, 9:29 AM
Compare:
"Steve Jobs [was] a tyrant": https://www.nytimes.com/2011/10/07/technology/steve-jobs-def...
Against:
"Whether to describe SJ as a tyrant is a matter of perspective...": https://chat.openai.com/share/28633f0c-007f-48b6-a615-1581c3...
The general way LLMs work do not preserve content in it's original form: the ideas they contain are extracted and clustered statistically - as a ELI5 refresher, an LLM reads 2 million NY Times articles and records that after the word "Steve" there are a lot of "Jobs" followed by a lot of "was a genius/tyrant", "founded Apple", etc. Then LLMs recreate the user question "Who was Steve Jobs?" using this complex net of token/word stats. Is that fair use? I think OpenAI lawyers will not even tap the fair use question, they will simply state that no copy happened, just a statistical collection of words from various sources.
And importantly: no LLM source is really prevalent, so the end result cannot be even be traced back to the source, especially if multiple, similar news sources are being fed to training. I have no idea how the Times is going to prove that its _theirs_ news.
by rurp on 8/16/23, 11:59 PM
I fear that LLMs are going to cause the internet to be a much worse and less open space.
by akolbe on 8/17/23, 7:11 AM
Moreover, AI would seem to be even more susceptible to capture and manipulation than conventional media.
When it's a question of guiding thought I prefer the humanities to tech. (Same with art.)
by bubblethink on 8/16/23, 11:55 PM
If, when someone reads a newspaper, they are served a paragraph-long answer from an NYTimes reporter that refashions reporting from local sources, the need to interact with the local sources is greatly diminished.
by machdiamonds on 8/17/23, 8:40 AM
by PeterisP on 8/17/23, 9:02 AM
In the short term, of course, the existing law matters, but the main discussion should be not on how to apply existing law but how to ensure that the new laws match what we-the-people would want.
by baby-yoda on 8/17/23, 8:26 AM
by jiscariot on 8/17/23, 12:03 AM
https://fair.org/home/20-years-later-nyt-still-cant-face-its...
by oefrha on 8/17/23, 10:15 AM
by brrrrrm on 8/16/23, 11:01 PM
by toss1 on 8/17/23, 2:26 PM
This seems to me to be completely standard in the newspaper industry. Many times every week, I see stories in the form "The [Major_News_Outlet] reports that [Event_X occurred] or [their investigation revealed Y] and here are the details [...].
Copyright protects the expression of an idea, not the idea itself. If you write a history of Issac Newton or the invention of semiconductors, I cannot copy that wholesale and sell it as mine, but nothing prevents me writing my own version, even using the same facts and citing your work.
I'm quite sure that I could provide a service where a bunch of workers read NYT articles and write brief summaries. I'm not sure they would even need citations, as long as we don't copy chunks wholesale.
If OpenAI is simply parroting the words of the NYT articles without Fair Use constraints (short blurbs), it seems they have a problem. If they are fully re-writing them into short non-copying summaries, it seems the NYT has a problem.
It'll be interesting to see how the courts sort this out.
by t_luke on 8/17/23, 10:44 AM
The current legal requirement to get clearance for all samples only arose after a bunch of court cases in the late 80s/ early 90s, mostly involving quite obscure musicians.
There are a lot of people on here who assume that ‘logic will prevail’ in the courts on questions like use of copyrighted data in training data. History shows that this really isn’t a safe assumption. The courts have historically been extremely favorable to copyright holders. It would be foolish to underestimate the legal risk to openai et al here
by 8ig8 on 8/16/23, 10:59 PM
> A top concern for The Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper's staff.
by CatWChainsaw on 8/22/23, 11:27 AM
I LOVE being told by techbros that a human painstaking studying one thing at a time, and not memorizing verbatin but rather taking away the core concept, is exactly the same type of "learning" that a model does when it takes in millions of things at once and can spit out copyrighted writing verbatim."
Personally I think they argue that way because they get off on being contrarian out of spite, but to me it's just a signal of maliciousness and stupidity all at once.
by FrustratedMonky on 8/16/23, 11:09 PM
This doesn't mean that 'copywrite' extends into my brain. A company can't copywrite what I'm thinking about. And what if I do try to paraphrase something from memory, from a few sources, and happen to spit out a very similar sentence from memory. Am I breaking the law?
To go further. Since all knowledge is pretty much fed into a human from hundreds of books, movies, TV, internet, all pumped into a human from birth. Then everything in the brain is a product of something with a copywrite. So anything produced is some amalgamation of copywrites.
Why not use similar argument for AI. It is clear when asking it to do something like "write a screen play for Othello using dialog like Tarantino, but with bit of style like Baz Luhrmann". That what it produces is 'as unique as a human' would be, or just as filled with things that have copywrites.
by voytec on 8/17/23, 8:16 AM
I'd like to see it happening but it sounds unrealistic.
by whywhywhywhy on 8/17/23, 9:33 AM
by mediumsmart on 8/17/23, 3:29 PM
by pierrefermat1 on 8/17/23, 8:54 AM
Relevant to the article: Large Language Models Meet Copyright Law at Simons
by mensetmanusman on 8/17/23, 11:53 AM
by bubblethink on 8/17/23, 12:02 AM
by zb3 on 8/17/23, 2:21 PM
by exabrial on 8/18/23, 12:57 AM
by olgeni on 8/17/23, 9:47 AM
by robbywashere_ on 8/17/23, 2:53 PM
by villgax on 8/17/23, 8:53 AM
by honeybadger1 on 8/17/23, 11:51 AM
by vldchk on 8/17/23, 12:58 PM
Journalists exist not without a reason, yes they work with facts and very often — open facts, but they still assemble those facts in certain way to construct a narrative, connect dots and tell us some story (not counting cases when journalist works with their sources and produce a unique inside information). Then OpenAI comes, says “thank you very much” and assemble all of journalists work into one Uber Knowledgeable Journalist who can answer all of your questions.
So far so good, we create a public good service, and copywriters are in shambles.
Until you start making money on it.
That’s where the problem.
If OpenAI would be a non profit organization like Wiki Foundation, who just wants to make internet as better place — not much arguments you can find to support NYT lawsuit. But monetization changes everything.
Basically NYT is not worried about re using its text as itself, it is worried that no one will want to visit NYT no more and will pay Microsoft/Google and get all answers from them.
Let’s put an example. There were a famous story when FT journalist discover a massive fraud in Wirecard accounting and essentially lead to a death of this organization. That articles were a result of multi-year reporting work when journalist piece by piece and step by step collect facts, meet people, and eventually spot the gap. Now, in age of Bard/Bing/ChatGPT, you don’t need to read original article to know all of this. You can ask search engine or Chatbot and get essential re phrasing of an original reporter work. You don’t need no more to go to FT, pay them for paywall, watch their ads, etc. Effectively FT make a huge investment into their people to allow them spend 2 years on this issue and report it and now have a 0 leads to their website because all of them are eaten by Google and Microsoft who will sell you their ads and retain you in their monetized products.
Imagine that you built a for-profit paid library for some task. You make a code available through paywall and ask people to pay you to get to it and solve their problems. Then Microsoft comes, sneak beyond paywall, scrap your code and publish it recompiled and slightly optimized version in open access, so no one longer ever need to go on your website but ask Microsoft to show them your code.
Would you be happy?
All of this cases for me make this case not such easy and straightforward as it seems to be “bad copywriters against progress of humanity”.
At the end of the day, if NYT/FT/New Yorker and others will stop publishing their work and fire all journalists, will ChatGPT tell us same depth level stories as we read there?
by aero-glide2 on 8/17/23, 8:33 AM
by gmerc on 8/17/23, 8:41 AM
News is trying to avoid the next generation of tech doing that to the long tail of data.