from Hacker News

We've filed a lawsuit against GitHub Copilot

by iworshipfaangs2 on 11/3/22, 8:30 PM with 781 comments

  • by an1sotropy on 11/3/22, 9:01 PM

    Seems important to point out that the announcement on this page (https://githubcopilotlitigation.com/) is a followup to https://githubcopilotinvestigation.com/ previously discussed here: https://news.ycombinator.com/item?id=33240341 (with 1219 comments)
  • by Cort3z on 11/3/22, 9:55 PM

    I’m not a lawyer, but here is why I believe a class action lawsuit is correct;

    “AI” is just fancy speak for “complex math program”. If I make a program that’s simply given an arbitrary input then, thought math operations, outputs Microsoft copyright code, am I in the clear just because it’s “AI”? I think they would sue the heck out of me if I did that, and I believe the opposite should be true as well.

    I’m sure my own open source code is in that thing. I did not see any attributions, thus they break the fundamentals of open source.

    In the spirit of Rick Sanchez; It’s just compression with extra steps.

  • by blackbrokkoli on 11/3/22, 11:21 PM

    I am sorry for not bringing any kind of legal perspective here, but:

    *Jesus Christ*, I hope I live long enough to see copyright die. Here we are at the cusp of a new paradigm of commanding computers to do stuff for us, right at the beginning of the first AI development which actually impresses me.

    And we are fucking bickering about how we were cheated out of $0.00034 because our repo from 2015 might have been used for training.

    I am also deeply disappointed in HackerNews; where is that deep hatred of patent trolls and smug satisfaction whenever something gets cracked or pirated now?

  • by CobrastanJorji on 11/3/22, 8:57 PM

    As a non-lawyer, I am very suspicious of the claim that "Plaintiffs and the Class have suffered monetary damages as a result of Defendants’ conduct." Flagrant disregard for copyright? Sure, maybe. The output of the model is subject to copyright? Who knows! But the copyright holders being damaged in some what? Seems doubtful. The best argument I could think of would be "GitHub would have had to pay us for this, and they didn't pay us, so we lost money," but that'd presumably work out to pennies per person.
  • by r3trohack3r on 11/3/22, 9:28 PM

    I'm not confident in this stance - sharing it to have a conversation. Hopefully some folks can help me think through this!

    The value of copyleft licenses, for me, was that we were fighting back against the notion of copyright. That you couldn't sell me a product that I wasn't allowed to modify and share my modifications back with others. The right to modify and redistribute transitively though the software license gave a "virality" to software freedom.

    If training a NN against a GPL licensed code "launders" away the copyleft license, isn't that a good thing for software freedom? If you can launder away a copyleft license, why couldn't you launder away a proprietary license? If training a NN is fair use, couldn't we bring proprietary software into the commons using this?

    It seems like the end goal of copyleft was to fight back against copyright, not to have copyleft. Tools like copilot seem to be an exceptionally powerful tool (perhaps more powerful than the GPL) for liberating software.

    What am I missing?

  • by adlpz on 11/3/22, 9:02 PM

    It feels weird saying this but, for once, I hope the big evil corporation gets to keep selling their big bad product.

    I find the pattern matching and repetitive code generation really helpful. And the library autocomplete on steroids, too.

    Meh. Tricky subject.

  • by albertzeyer on 11/3/22, 10:40 PM

    I really don't understand how there can be a problem with how Copilot works. Any human just works in the same way. A human is trained on lots and lots of of copyrighted material. Still, what a human produces in the end is not automatically derived work from all the human has seen in his life before.

    So, why should an AI be treated different here? I don't understand the argument for this.

    I actually see quite some danger in this line of thinking, that there are different copyright rules for an AI compared to a human intelligence. Once you allow for such arbitrary distinction, it will get restricted more and more, much more than humans are, and that will just arbitrarily restrict the usefulness of AI, and effectively be a net negative for the whole humanity.

    I think we must really fight against such undertaking, and better educate people on how Copilot actually works, such that no such misunderstanding arises.

  • by herpderperator on 11/3/22, 9:05 PM

    The title of the submitted PDF document: "Microsoft Word - 2022-11-02 Copilot Complaint (near final)"[0]

    I've noticed this a lot and it's quite funny seeing what the actual filename of the document was. Does this just get included as metadata by default when you export to PDF?

    [0] https://githubcopilotlitigation.com/pdf/1-0-github_complaint...

  • by deanjones on 11/3/22, 10:10 PM

    This will fail very quickly. The licence that project owners publish with their code on Github applies to third parties who wish to use the code, but does not apply to Github. Authors who publish their code on Github grant Github a licence under the Github Terms: https://docs.github.com/en/site-policy/github-terms/github-t...

    Specifically, sections D.4 to D.7 grant Github the right to "to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video."

  • by karaterobot on 11/3/22, 9:57 PM

    Does everybody credit the author when using Stack Overflow code? I have, but don't always. Not that I'm trying to steal, I just don't take the time, especially in personal projects.

    This isn't exactly the same thing, but it seems to me that three of the biggest differences are:

    1. Stack Overflow code is posted for people to use it (fair enough, but they do have a license that requires attribution anyway, so that's not an escape)

    2. Scale (true; but is it a fundamental difference?)

    3. People are paying attention in this case. Nobody is scanning my old code, or yours, but if they did, would they have a case?

    I dunno. I'm more sympathetic to visual artists who have their work slurped up to be recapitulated as someone else's work via text to image models. Code, especially if it is posted publicly, doesn't feel like it needs to be guarded. I'm not saying this is correct, just saying that's my reaction, and I wonder why it's wrong.

  • by Imnimo on 11/3/22, 9:04 PM

    On page 18, they show Copilot produces the following code:

    >function isEven(n) {

    > return n % 2 === 0;

    >}

    They then say, "Copilot’s Output, like Codex’s, is derived from existing code. Namely, sample code that appears in the online book Mastering JS, written by Valeri Karpov."

    Surely everyone reading this has written that code verbatim at some point in their lives. How can they assert that this code is derived specifically from Mastering JS, or that Karpov has any copyright to that code?

  • by celestialcheese on 11/3/22, 9:22 PM

    Maybe I'm being too cynical, but this feels like it's more a law firm and individual looking to profit and make their mark in legal history rather than an aggrieved individual looking for justice.

    Programmer/Lawyer Plaintiff + upstart SF Based Law Firm + novel technology = a good shot at a case that'll last a long time, and fertile ground to establish yourself as experts in what looks to be a heavily litigated area over the next decade+.

  • by xchip on 11/3/22, 9:16 PM

    LOL we look like taxi drivers fighting Uber.

    If Kasparov uses chess programs to be better at chess maybe we can use copilot to be better developers?

    Also, anyone, either a person or a machine, is welcome to learn from the code I wrote, actually that is how I learnt how to code, so why would I stop others from doing the same?.

  • by abouttyme on 11/3/22, 8:58 PM

    I suspect this will be the first of many lawsuits over training data sets. Just because it is obscured by artificial neural networks doesn't mean it's an original work that is not subject to copyright restrictions.
  • by naillo on 11/3/22, 8:54 PM

    I'm kinda sceptical that this goes anywhere given that basically they say that whatever copilot outputs is your responsibility to vet that it doesn't break any copyright (obviously that goes against the promise of it and the PR but that's the small print that gets them out of trouble).
  • by iworshipfaangs2 on 11/3/22, 8:37 PM

    It's also a class action,

    > behalf of a pro­posed class of pos­si­bly mil­lions of GitHub users...

    The appendix includes the 11 licenses that the plaintiffs say GitHub Copilot violates: https://githubcopilotlitigation.com/pdf/1-1-github_complaint...

  • by cmrdporcupine on 11/3/22, 9:51 PM

    If Microsoft is so confident in the legality and ethics of Copilot, and that it doesn't leak or steal proprietary IP... they should go train it on the MS Word and Windows and Excel source trees.

    What's that? They don't want to do that? Why not?

  • by jeffhwang on 11/3/22, 8:58 PM

    Wow, this is interesting iteration in the ongoing divide between "East Coast code" vs. "West Coast code" as defined by Larry Lessig. For background, see https://lwn.net/Articles/588055/
  • by IceWreck on 11/3/22, 9:02 PM

    I am not against this lawsuit but I'm against the implications of this because it can lead to disastrous laws.

    A programmer can read available but not oss licensed code and learn from it. Thats fair use. If a machine does it, is it wrong ? What is the line between copying and machine learning ? Where does overfitting come in ?

    Today they're filing a lawsuit against copilot.

    Tomorrow it will be against stable diffusion or (dall-e, gpt-3 whatever)

    And then eventually against Wine/Proton and emulators (are APIs copyrightable)

  • by elcomet on 11/3/22, 9:53 PM

    This is why we can't have nice things. Copilot is the best thing that happened in developper tools since a long time, it increased a lot my productivity. Please don't ruin it.
  • by protomyth on 11/3/22, 9:07 PM

    I really feel that Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith[0] is going to have a big effect on this type of thing. They are basically relying on their AI magic to make it transformative. I'm starting to think the era of learning from material other people own without a license / permission is going to end quickly.

    0) https://www.scotusblog.com/case-files/cases/andy-warhol-foun...

  • by topher6345 on 11/3/22, 11:06 PM

    Is it not in the agency of the developer to hit the save button?

    It seems like GitHub Copilot can spit out copyrighted works all day but the person running the text editor has to "choose" which Copilot output to actually save/commit/deploy.

    Does it really matter that much "how" the text in your text editor gets there? You write it yourself or copy/paste it or have Copilot generate it. Ultimately the individual that "approved" it to be saved to the disk is the one violating the copyright, Copilot is just making a "suggestion".

  • by nullc on 11/3/22, 10:10 PM

    I think if this is successful it will be very bad for the open world.

    Large platforms like github will just stick blanket agreements into the TOS which grant them permission (and require you indemnify them for any third party code you submit). By doing so they'll gain a monopoly on comprehensively trained AI, and the open world that doesn't have the lever of a TOS will not at all be able to compete with that.

    Copilot has seemed to have some outright copying problems, presumably because its a bit over-fit. (perhaps to work at all it must be because its just failing to generalize enough at the current state of development) --- but I'm doubtful that this litigation could distinguish the outright copying from training in a way that doesn't substantially infringe any copyright protected right (e.g. where the AI learns the 'ideas' rather than verbatim reproducing their exact expressions).

    The same goes for many other initiatives around AI training material-- e.g. people not wanting their own pictures being used to train facial recognition. Litigating won't be able to stop it but it will be able to hand the few largest quasi-monopolisits like facebook, google, and microsoft a near monopoly over new AI tools when they're the only ones that can overcome the defaults set by legislation or litigation.

    It's particularly bad because the spectacular data requirements and training costs already create big centralization pressures in the control of the technology. We will not be better off if we amplify these pressures further with bad legal precedents.

  • by bkuhn on 11/4/22, 5:58 PM

    In case folks here were curious, we at the Software Freedom Conservancy have asked the Plaintiffs to endorse the Principles of Community-Oriented GPL enforcement: https://sfconservancy.org/news/2022/nov/04/class-action-laws...

    … & of course we again ask Microsoft's GitHub to start respecting FOSS licenses, cooperate with the community, & retract their incorrect claim that their behavior is “fair use”.

    A few more links to our work on this issue:

    https://sfconservancy.org/blog/2022/feb/03/github-copilot-co... https://sfconservancy.org/news/2022/feb/23/committee-ai-assi...

  • by foooobaba on 11/3/22, 9:37 PM

    It seems like we should come to agreement on what the license is intended for, given that when the licenses were created in a time before AI like this existed. If the authors did not intend their code to be used like this, should we not respect it? Also, does it make sense to create new licenses which explicitly state whether using it for AI training is acceptable or not - or are our current licenses good enough?
  • by solomatov on 11/3/22, 9:49 PM

    The most important part of this is not whether the lawsuit will be won or lost by one of the parties, but what is the legality of fair use in machine learning, and language models. There's a good chance that it gets to Supreme Court and there will be a defining precedent to be used by future entrepreneurs about what's possible and what's not.

    P.S. I am not a lawyer.

  • by warbler73 on 11/3/22, 9:05 PM

    It seems obvious that AI models are derivative works of the works they are trained on but it also seems obvious that it is totally legally untested whether they are derivative works in the formal legal sense of copyright law. So it should be a good case assuming we have wise and enlightened judges who understand all nuances and can guide us into the future.
  • by buzzy_hacker on 11/3/22, 9:02 PM

    Copilot has always seemed like a blatant GPL violation to me.
  • by foooobaba on 11/3/22, 9:09 PM

    If github or google indexes source code using a neural net to help you find it, given a query, is that also illegal? If you think of copilot as something that helps you find code you’re looking for, is it all that different, and if so, why?

    In this case, wouldn’t the users of copilot be the ones responsible for any copyrighted code they may have accessed using copilot?

  • by hu3 on 11/3/22, 9:02 PM

    A a GitHub user, is there a way to support GitHub against this lawsuit?

    Obviously not financially as Microsoft has basically YES amounts of money.

  • by awestroke on 11/3/22, 9:05 PM

    If this leads anywhere I'll be pissed. I love CoPilot.
  • by still_grokking on 11/4/22, 12:21 AM

    I hope MS used a lot of AGPL code to train Copilot… This would be fun.

    But no matter how this goes, in case training AI with copyrighted inputs is "fair use" that'll end up as the ultimate "copyright laundry machine" like this "joke" project here:

    https://web.archive.org/web/20220104214929/https://fairuseif...

    https://news.ycombinator.com/item?id=27796124 (302 points, 151 comments)

  • by rafaelturk on 11/3/22, 9:20 PM

    Like everything legally related: This is not about open source fairness, protecting innovation, it's all about making money.
  • by throwaway675309 on 11/3/22, 11:33 PM

    Even if this succeeds, you've already lost.

    1. The ability to be able to run and train these models is going to eventually be perfectly plausible on a home machine.

    2. It's only a matter of time before models, e.g. a popular model scraped from all of the code on GitHub, is a publicly available torrent.

    3. People will be able to just run it locally as an integrated plug-in in jet brains or VS code.

    4. You'll never know if somebody has lifted their code in violation of a license anymore than you would be able to tell if somebody used code from stack overflow without attribution in any commercial endeavor.

    The End.

  • by falcolas on 11/3/22, 11:07 PM

    Crackpot Theory: Copilot (and by association many ML tools) is a form of probabilistic encryption. Once encoded, it's virtually impossible to pull the code (plaintext) directly out of the raw ML model (the cyphertext), yet when the proper key is input ('//sparse matrix transpose'), you get the relevant segment of the original function (the plaintext) back.

    We've even seen this with stable diffusion image generation, where specific watermarks can be re-created (decrypted?) deterministically with the proper input.

  • by spir on 11/3/22, 10:01 PM

    The part of GitHub Copilot to which I object is that it's trained on private repos. Where does GitHub get off consuming explicitly private intellectual property for their own purposes?
  • by garfieldnate on 11/8/22, 11:07 AM

    If GitHub ends up having to tweak their product to avoid ethical/legal concerns, I actually imagine it could still be pretty cool. Right now Copilot is a black box that spits out code with no attributes; what if they worked on instead making it a glass box, where it always brings up snippets of other projects along with their licensing info so that you can decide how to incorporate the ideas fairly yourself? Or they could still output the same code suggestions, but always include attribution and license data along with it. Making the product more transparent would probably make more people comfortable with using it, anyway.
  • by Cloudef on 11/3/22, 10:07 PM

    Unless the copilot spits out complete programs or libraries that are 1:1 to someone elses who cares? Caring about random small code snippets is dumb.
  • by bilsbie on 11/3/22, 8:59 PM

    Laws need to change to match technology.

    Did you know before airplanes were invented common law said you owned the air above your land all the way to the heavens.

  • by brookst on 11/3/22, 8:59 PM

    I wonder if the plaintiffs' code would stand up to scrutiny of whether any of it was copied, even unintentionally, from other code they saw in their years of learning to program? I know that I have more-or-less transcribed from Stack Overflow/etc, and I have a strong suspicion that I have probably produced code identical to snippets I've seen in the past.
  • by layer8 on 11/3/22, 10:46 PM

    Copilot reminds me of the Borg: You will be assimilated. We will add your technological distinctiveness to our own. Resistance is futile.
  • by omegacharlie on 11/4/22, 12:12 AM

    Think some of the negativity about Copilot may be the perception that if an individual or small startup attempted training an ML model from public source-code and commercialised a service from it they would be drowning in legal issues from big companies not happy with their code used in such a product.

    In addition just because code is available publicly on GitHub does not necessarily mean it is permissively licensed to use elsewhere, even with attribution. Copyright holders not happy with their copyrighted works publicly accessible can use the DMCA to issue take-downs that GitHub does comply with but how that interacts with Copilot and any of its training data is a different question.

    As much as the DMCA is bad law rather funny seeing Microsoft be charged in this lawsuit with the less known provision against 'removal of copyright management information'. Microsoft does have more resources to mount at defence so it will probably end up different compared to a smaller player facing this action.

  • by rolenthedeep on 11/4/22, 12:05 AM

    Consider each repo on github to be a movie. What copilot does is to search for sequences of frames from any movie which line up to create a new coherent movie.

    Individually, each frame is protected by the copyright of the movie it belongs to. But what happens if you take a million frames from a million different movies and just arrange them in a new way?

    That's the core question here. Is the new movie a new copyrightable work, or is it plagiarizing a million other works at once? Is it legal to use copyrighted works in this way?

    The other question is if it is right to use copyrighted works this way. Is this within the spirit of open source software? Or is this just a bad corporation taking advantage of your good will?

    I'm not sure where I stand on this, it's a complicated problem for sure. Definitely interested to see how this plays out in court.

  • by poulpy123 on 11/4/22, 12:37 PM

    >By train­ing their AI sys­tems on pub­lic GitHub repos­i­to­ries (though based on their pub­lic state­ments, pos­si­bly much more) we con­tend that the defen­dants have vio­lated the legal rights of a vast num­ber of cre­ators who posted code or other work under cer­tain open-source licenses on GitHub.

    I don't know about the US laws in copyright so I can't comment on the legal documents but this website is not complaining that copilot is reproducing copyrighted content but it was trained on copyrighted content. I don't see how you can forbid someone or something to read and learn from something that is public (once again producing is another problem)

  • by throwaway675309 on 11/3/22, 11:17 PM

    How much code is necessary to be considered a copyright infringement from an existing code base?

    For example let's say I'll take a single frame of animation from a cartoon, The frame contains a mountain, house, and a couple characters although those characters are not integral to the actual cartoon maybe they're extras (villagers and not named characters something like Mickey Mouse for example)

    I draw a picture of a lake with a cabin next to it, then start to draw a frontiersman but I trace one of his arms from a villager of that previous frame of animation... Number one am I in danger of copyright infringement (have I hit some arbitrary threshold), and number two: am I causing monetary losses for the cartoon?

  • by jasonladuke0311 on 11/4/22, 12:40 AM

    Merits of the case aside, I'm befuddled that a company with a legal team like Microsoft approved this product. Is their assumption that this would bring in more revenue than potentially defending it in court? The math doesn't make sense to me.
  • by RamblingCTO on 11/3/22, 8:57 PM

    lol @ "open-source soft­ware piracy"

    If I'm being honest I'm a bit annoyed at this. What's the problem and what's the point of this?

  • by renewiltord on 11/3/22, 9:19 PM

    It doesn't make sense. If I make a piece of software that curls a random gist and then puts it into your editor am I infringing or are you infringing when you run it or are you infringing when you use that file and distribute it somewhere?
  • by mezbot on 11/4/22, 3:35 AM

    This issue seems to have an obvious solution that I fail to see anyone mention: Treat copilot simply as a tool, let it be trained on whatever without any consent requirements. However the outputs should be subject to copyright as with any other code produced by a human. Then on a case by case basis courts can decide if infringement has occurred. The idea of banning copilot or other AI models as a whole just seems like a collective case of sour grapes because innovation and automation is finally threatening some people who only expected these things to affect the working class
  • by EMIRELADERO on 11/3/22, 9:20 PM

    I think it's a great time to explain why this won't hit AI art such as Stable Diffusion, even if GitHub loses this case.

    The crux of the lawsuit's argument is that the AI unlawfully outputs copyrighted material. This is evident in many tests with many people here and on Twitter even getting verbatim comments out of it.

    AI art, in the other hand, is not capable of outputting the images from its training set, as it's not a collage-maker, but an artificial brain with a paintbrush and virtual hand.

  • by fancyfredbot on 11/3/22, 9:07 PM

    If a software developer learns how to code better by reading GPL software and then later uses the skills they developed to build closed source for profit software should they be sued?
  • by hjroberts on 11/3/22, 11:32 PM

    Whether it is legally wrong or not to scan OSS code (I think it is wrong), there has been a time-honored precedent for disallowing automated scanning:

      robots.txt 
    
    This is exactly what is needed for source code, and the default (no robots.txt) should be "disallow".

    The fact that the Web has considered this moral issue should be a strong hint for the AI people not to take a purely legal stance but consider the OSS community that they are so heavily using.

  • by atum47 on 11/3/22, 9:34 PM

    Forgive my ignorance, but who is going to benefit from this lawsuit? I have a lot of code on GitHub, can I, for instance, expect a check in the mail in case of a win?
  • by datacruncher01 on 11/4/22, 2:51 AM

    I think the software is probably ok provided that, the sources are credited (ie, if co-pilot copies code from say SDL, then the relevant code sections need to be correctly attributed, the mandatory license readme copied to the project so all code is following the open source licenses used. That's literally the purpose of open source licenses. If Copilot can't be bothered to do that, then yeah it should be shut down.
  • by cothrowaway88 on 11/3/22, 9:08 PM

    Made a throwaway since I guess this stance is controversial. I could not care less about how copilot was made and what kind of code it outputs. It's useful and was inevitable.

    I'm 1000% on team open source and have had to refer to things like tldrlegal.com many times to make sure I get all my software licensing puzzle pieces right. Totally get the argument for why this litigation exists in the present.

    Just saying in general my friends I hope you have an absolutely great day. Someone will be wrong on the internet tomorrow, no doubt about it. Worry about something productive instead.

    This one has the feel of being nothing more than tilting at windmills in the long run.

  • by 0cf8612b2e1e on 11/3/22, 9:00 PM

    Is there any amount of public data/code/whatever I can make an offline backup of today in the event this gets pulled?
  • by matthewwolfe on 11/4/22, 2:19 AM

    I will never understand why people push code to public repos and then complain when someone or something uses that code. Code that you want to keep private or make money off of should be private. Only publish stuff to the public that you want other people to see and learn from. All the complaints about attribution… who cares.
  • by pmarreck on 11/3/22, 10:17 PM

    This will fail. Copilot is too good, and only suggests snippets or small functions, not entire classes for example.
  • by User23 on 11/4/22, 1:56 AM

    Copilot is clearly a derivative work. So is every other similar model. How is this even up for discussion?
  • by stovenctl on 11/3/22, 11:08 PM

    The comparison I would draw is it's a statistics based search engine for code.

    Sometimes the query is the first half of a small statement that we can fill in with common patterns. Useful, fair.

    Sometimes the query is a signature like `fn fast_inv_sqrt` that copies someone's code and doesn't attribute it.

  • by nuc1e0n on 11/4/22, 6:29 PM

    My own view is that it is not legal for humans to produce derivatives of copyrighted works currently. So therefore it is probably already not legal to train an artificial intelligence using copyrighted works to in order to produce derivatives either.
  • by jjgon1781 on 11/4/22, 5:42 AM

    I am surprise in the amount of people that in favor in copilot being train with copyright data.
  • by scoot on 11/4/22, 12:38 AM

    The editorialized title isn't correct. The lawsuit is against GitHub for Copilot not against GitHub Copilot, which is not a "legal person".

    A better shortening if the original title is simple "We’ve filed a law­suit chal­leng­ing GitHub Copi­lot"

  • by reachableceo on 11/4/22, 1:01 AM

    Let me (start or join the call) for federal investigation and the filing of criminal complaints in all relevant locales.

    Grand theft , interstate wire fraud and conspiracy for same.

    This is a criminal matter as well as civil. Intentional and knowing violation of the law.

    We must not let our work be taken!

  • by gcau on 11/3/22, 11:44 PM

    As much as I love the little guy beating the big evil company, I hope the lawsuit doesn't cause anything to happen to copilot. Maybe some changes, like better protection against emitting 1:1 licensed code or opting out your code from training.
  • by vlovich123 on 11/4/22, 4:30 AM

    Can someone explain to me Microsoft’s decision here to use GPL code in the training set? It would seem like sticking to non-attribution / non-viral licenses would have kept them in the clear. Was that an insufficient size data set?
  • by eurasiantiger on 11/3/22, 9:08 PM

    Maybe we just need to prompt it to include the proper licenses and attributions. /s
  • by thesuperbigfrog on 11/3/22, 8:56 PM

    How original is the generated code?

    Can the generated code be traced back to the code used for training and the original copyrights and licenses for that code?

    If so, what attribution(s) and license(s) should apply to the generated code?

  • by arpowers on 11/3/22, 9:09 PM

    The proper way to think about these LLM is similar to plagiarism.

    Seems to me the underlying data should be opt-in from creators and licenses should be developed that take AI into consideratiin.

  • by Aeolun on 11/4/22, 12:40 AM

    I find this whole subject exhausting. The only reason I’m glad there is a lawsuit is that we can finally put this thing to rest when either party wins.
  • by Yahivin on 11/3/22, 9:12 PM

    Copilot does include the licenses...

    Start off a comment with // MIT license

    Then watch parts of various software licenses come out including authors' names and copyrights!

  • by marmada on 11/4/22, 1:53 AM

    All these people whining about copyright need to consider: is the issue Copilot, or is the issue copyright.
  • by amelius on 11/3/22, 11:47 PM

    Can Copilot reproduce Numerical Recipes in C?

    (asking because I know the authors were kinda famous for being very litigious).

  • by HeavyStorm on 11/4/22, 10:19 AM

    "Angry people brandish their fists against the incoming revolution" is also a good title.
  • by sensanaty on 11/3/22, 10:57 PM

    I personally hope they win, and win big. Anything that ruins Micro$oft's day is a boon to mine.
  • by clusterhacks on 11/3/22, 9:19 PM

    Did Microsoft use the source code of Windows (in whole or in part) as training input to Copilot?
  • by machiste77 on 11/3/22, 9:12 PM

    bruh, come on! you're gonna ruin it for the rest of us
  • by kgarten on 11/4/22, 2:29 AM

    on a tangent ... beautiful typography, I love Matthew Butterick's work on legible fonts an his guide to practicle typography.

    all the best with the lawsuit.

  • by barelysapient on 11/3/22, 9:35 PM

    MSFT to $0 anyone?
  • by i_like_apis on 11/4/22, 2:17 AM

    I love that this is going to loose.
  • by SighMagi on 11/3/22, 9:29 PM

    I did not see that coming.
  • by SurgeArrest on 11/3/22, 9:37 PM

    I hope this case will fail and establish a good precedent for all future AI litigations and may be even prevent new ones. Your code is open source - irregardless of license, one might read it as a text book and then remember or even copy snippets and re-use this somewhere else unrelated to the original application. If you don't like this, don't make your code open source. This was happening and is happening independent of any license all over the world by majority of developers. What Copilot and similar tools did was to make those snippets accessible for extrapolation in new applications.

    If these folks win - we again throw progress under the bus.

  • by ISL on 11/3/22, 9:00 PM

    Can anyone with Copilot access give a short summary of its response to the prompts:

      function force=Gmmr2Array(mass1, mass2)
    
    and

      function [force, torque]=pointMatrixGravity(array1,array2)
    
    
    ?

    I'd love to know if some of my GPL v3 code [1, 2] has landed in the training set

    [1] https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/...

    [2] https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/...

  • by m00x on 11/3/22, 9:03 PM

    The only people who gain out of class lawsuits are the lawyers.

    This person (a lawyer) saw an opportunity to make money and jumped on it like a hungry tiger on fresh meat.

  • by Entinel on 11/3/22, 9:16 PM

    I don't have a comment on this personally but I want to throw this out there because every time I see people criticizing Copilot or Dall-E someone always says "BUT ITS FAIR USE! Those people don't seem to grasp that "Fair Use" is a defense. The burden is not on me to prove what you are doing is not fair use; the burden is on you to prove what you are doing is fair use
  • by VoodooJuJu on 11/3/22, 9:55 PM

    As celestialcheese says [1], it seems like a manufactured case for the purpose of furthering someone's legal career rather than seeking remittance for any violations made by Copilot.

    But I like to put on my conspiracy hat from time to time, and right now is one such time, so let's begin...

    Though the motivations behind this case are uncertain, what is certain is that this case will establish a precedent. As we know, precedents are very important for any further rulings on cases of a similar nature.

    Could it be the case that Microsoft has a hand in this, in trying to preempt a precedent that favors Copilot in any further litigation against it?

    Wouldn't put it past a company like Microsoft.

    Just a wild thought I had.

    [1] https://news.ycombinator.com/item?id=33457826

  • by bugfix-66 on 11/3/22, 9:00 PM

    Ask HN: I want to modify the BSD 2-Clause Open Source License to explicitly prohibit the use of the licensed software in training systems like Microsoft's Copilot (and use during inference). How should the third clause be worded?

      The No-AI 3-Clause Open Source Software License
    
      Copyright (C) <YEAR> <COPYRIGHT HOLDER>
    
      All rights reserved.
    
      Redistribution and use in source and binary forms, with or without
      modification, are permitted provided that the following conditions
      are met:
    
      1. Redistributions of source code must retain the above copyright
         notice, this list of conditions and the following disclaimer.
    
      2. Redistributions in binary form must reproduce the above copyright
         notice, this list of conditions and the following disclaimer in
         the documentation and/or other materials provided with the
         distribution.
    
      3. Use in source or binary forms for the construction or operation
         of predictive software generation systems is prohibited.
    
      THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
      "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
      LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
      A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
      HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
      SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
      LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
      DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
      THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
      (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
      OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    
    https://bugfix-66.com/f0bb8770d4b89844d51588f57089ae5233bf67...
  • by 60secs on 11/3/22, 9:22 PM

    This is why we can't have nice dystopias.