by iworshipfaangs2 on 11/3/22, 8:30 PM with 781 comments
by an1sotropy on 11/3/22, 9:01 PM
by Cort3z on 11/3/22, 9:55 PM
“AI” is just fancy speak for “complex math program”. If I make a program that’s simply given an arbitrary input then, thought math operations, outputs Microsoft copyright code, am I in the clear just because it’s “AI”? I think they would sue the heck out of me if I did that, and I believe the opposite should be true as well.
I’m sure my own open source code is in that thing. I did not see any attributions, thus they break the fundamentals of open source.
In the spirit of Rick Sanchez; It’s just compression with extra steps.
by blackbrokkoli on 11/3/22, 11:21 PM
*Jesus Christ*, I hope I live long enough to see copyright die. Here we are at the cusp of a new paradigm of commanding computers to do stuff for us, right at the beginning of the first AI development which actually impresses me.
And we are fucking bickering about how we were cheated out of $0.00034 because our repo from 2015 might have been used for training.
I am also deeply disappointed in HackerNews; where is that deep hatred of patent trolls and smug satisfaction whenever something gets cracked or pirated now?
by CobrastanJorji on 11/3/22, 8:57 PM
by r3trohack3r on 11/3/22, 9:28 PM
The value of copyleft licenses, for me, was that we were fighting back against the notion of copyright. That you couldn't sell me a product that I wasn't allowed to modify and share my modifications back with others. The right to modify and redistribute transitively though the software license gave a "virality" to software freedom.
If training a NN against a GPL licensed code "launders" away the copyleft license, isn't that a good thing for software freedom? If you can launder away a copyleft license, why couldn't you launder away a proprietary license? If training a NN is fair use, couldn't we bring proprietary software into the commons using this?
It seems like the end goal of copyleft was to fight back against copyright, not to have copyleft. Tools like copilot seem to be an exceptionally powerful tool (perhaps more powerful than the GPL) for liberating software.
What am I missing?
by adlpz on 11/3/22, 9:02 PM
I find the pattern matching and repetitive code generation really helpful. And the library autocomplete on steroids, too.
Meh. Tricky subject.
by albertzeyer on 11/3/22, 10:40 PM
So, why should an AI be treated different here? I don't understand the argument for this.
I actually see quite some danger in this line of thinking, that there are different copyright rules for an AI compared to a human intelligence. Once you allow for such arbitrary distinction, it will get restricted more and more, much more than humans are, and that will just arbitrarily restrict the usefulness of AI, and effectively be a net negative for the whole humanity.
I think we must really fight against such undertaking, and better educate people on how Copilot actually works, such that no such misunderstanding arises.
by herpderperator on 11/3/22, 9:05 PM
I've noticed this a lot and it's quite funny seeing what the actual filename of the document was. Does this just get included as metadata by default when you export to PDF?
[0] https://githubcopilotlitigation.com/pdf/1-0-github_complaint...
by deanjones on 11/3/22, 10:10 PM
Specifically, sections D.4 to D.7 grant Github the right to "to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video."
by karaterobot on 11/3/22, 9:57 PM
This isn't exactly the same thing, but it seems to me that three of the biggest differences are:
1. Stack Overflow code is posted for people to use it (fair enough, but they do have a license that requires attribution anyway, so that's not an escape)
2. Scale (true; but is it a fundamental difference?)
3. People are paying attention in this case. Nobody is scanning my old code, or yours, but if they did, would they have a case?
I dunno. I'm more sympathetic to visual artists who have their work slurped up to be recapitulated as someone else's work via text to image models. Code, especially if it is posted publicly, doesn't feel like it needs to be guarded. I'm not saying this is correct, just saying that's my reaction, and I wonder why it's wrong.
by Imnimo on 11/3/22, 9:04 PM
>function isEven(n) {
> return n % 2 === 0;
>}
They then say, "Copilot’s Output, like Codex’s, is derived from existing code. Namely, sample code that appears in the online book Mastering JS, written by Valeri Karpov."
Surely everyone reading this has written that code verbatim at some point in their lives. How can they assert that this code is derived specifically from Mastering JS, or that Karpov has any copyright to that code?
by celestialcheese on 11/3/22, 9:22 PM
Programmer/Lawyer Plaintiff + upstart SF Based Law Firm + novel technology = a good shot at a case that'll last a long time, and fertile ground to establish yourself as experts in what looks to be a heavily litigated area over the next decade+.
by xchip on 11/3/22, 9:16 PM
If Kasparov uses chess programs to be better at chess maybe we can use copilot to be better developers?
Also, anyone, either a person or a machine, is welcome to learn from the code I wrote, actually that is how I learnt how to code, so why would I stop others from doing the same?.
by abouttyme on 11/3/22, 8:58 PM
by naillo on 11/3/22, 8:54 PM
by iworshipfaangs2 on 11/3/22, 8:37 PM
> behalf of a proposed class of possibly millions of GitHub users...
The appendix includes the 11 licenses that the plaintiffs say GitHub Copilot violates: https://githubcopilotlitigation.com/pdf/1-1-github_complaint...
by cmrdporcupine on 11/3/22, 9:51 PM
What's that? They don't want to do that? Why not?
by jeffhwang on 11/3/22, 8:58 PM
by IceWreck on 11/3/22, 9:02 PM
A programmer can read available but not oss licensed code and learn from it. Thats fair use. If a machine does it, is it wrong ? What is the line between copying and machine learning ? Where does overfitting come in ?
Today they're filing a lawsuit against copilot.
Tomorrow it will be against stable diffusion or (dall-e, gpt-3 whatever)
And then eventually against Wine/Proton and emulators (are APIs copyrightable)
by elcomet on 11/3/22, 9:53 PM
by protomyth on 11/3/22, 9:07 PM
0) https://www.scotusblog.com/case-files/cases/andy-warhol-foun...
by topher6345 on 11/3/22, 11:06 PM
It seems like GitHub Copilot can spit out copyrighted works all day but the person running the text editor has to "choose" which Copilot output to actually save/commit/deploy.
Does it really matter that much "how" the text in your text editor gets there? You write it yourself or copy/paste it or have Copilot generate it. Ultimately the individual that "approved" it to be saved to the disk is the one violating the copyright, Copilot is just making a "suggestion".
by nullc on 11/3/22, 10:10 PM
Large platforms like github will just stick blanket agreements into the TOS which grant them permission (and require you indemnify them for any third party code you submit). By doing so they'll gain a monopoly on comprehensively trained AI, and the open world that doesn't have the lever of a TOS will not at all be able to compete with that.
Copilot has seemed to have some outright copying problems, presumably because its a bit over-fit. (perhaps to work at all it must be because its just failing to generalize enough at the current state of development) --- but I'm doubtful that this litigation could distinguish the outright copying from training in a way that doesn't substantially infringe any copyright protected right (e.g. where the AI learns the 'ideas' rather than verbatim reproducing their exact expressions).
The same goes for many other initiatives around AI training material-- e.g. people not wanting their own pictures being used to train facial recognition. Litigating won't be able to stop it but it will be able to hand the few largest quasi-monopolisits like facebook, google, and microsoft a near monopoly over new AI tools when they're the only ones that can overcome the defaults set by legislation or litigation.
It's particularly bad because the spectacular data requirements and training costs already create big centralization pressures in the control of the technology. We will not be better off if we amplify these pressures further with bad legal precedents.
by bkuhn on 11/4/22, 5:58 PM
… & of course we again ask Microsoft's GitHub to start respecting FOSS licenses, cooperate with the community, & retract their incorrect claim that their behavior is “fair use”.
A few more links to our work on this issue:
https://sfconservancy.org/blog/2022/feb/03/github-copilot-co... https://sfconservancy.org/news/2022/feb/23/committee-ai-assi...
by foooobaba on 11/3/22, 9:37 PM
by solomatov on 11/3/22, 9:49 PM
P.S. I am not a lawyer.
by warbler73 on 11/3/22, 9:05 PM
by buzzy_hacker on 11/3/22, 9:02 PM
by foooobaba on 11/3/22, 9:09 PM
In this case, wouldn’t the users of copilot be the ones responsible for any copyrighted code they may have accessed using copilot?
by hu3 on 11/3/22, 9:02 PM
Obviously not financially as Microsoft has basically YES amounts of money.
by awestroke on 11/3/22, 9:05 PM
by still_grokking on 11/4/22, 12:21 AM
But no matter how this goes, in case training AI with copyrighted inputs is "fair use" that'll end up as the ultimate "copyright laundry machine" like this "joke" project here:
https://web.archive.org/web/20220104214929/https://fairuseif...
https://news.ycombinator.com/item?id=27796124 (302 points, 151 comments)
by rafaelturk on 11/3/22, 9:20 PM
by throwaway675309 on 11/3/22, 11:33 PM
1. The ability to be able to run and train these models is going to eventually be perfectly plausible on a home machine.
2. It's only a matter of time before models, e.g. a popular model scraped from all of the code on GitHub, is a publicly available torrent.
3. People will be able to just run it locally as an integrated plug-in in jet brains or VS code.
4. You'll never know if somebody has lifted their code in violation of a license anymore than you would be able to tell if somebody used code from stack overflow without attribution in any commercial endeavor.
The End.
by falcolas on 11/3/22, 11:07 PM
We've even seen this with stable diffusion image generation, where specific watermarks can be re-created (decrypted?) deterministically with the proper input.
by spir on 11/3/22, 10:01 PM
by garfieldnate on 11/8/22, 11:07 AM
by Cloudef on 11/3/22, 10:07 PM
by bilsbie on 11/3/22, 8:59 PM
Did you know before airplanes were invented common law said you owned the air above your land all the way to the heavens.
by brookst on 11/3/22, 8:59 PM
by layer8 on 11/3/22, 10:46 PM
by omegacharlie on 11/4/22, 12:12 AM
In addition just because code is available publicly on GitHub does not necessarily mean it is permissively licensed to use elsewhere, even with attribution. Copyright holders not happy with their copyrighted works publicly accessible can use the DMCA to issue take-downs that GitHub does comply with but how that interacts with Copilot and any of its training data is a different question.
As much as the DMCA is bad law rather funny seeing Microsoft be charged in this lawsuit with the less known provision against 'removal of copyright management information'. Microsoft does have more resources to mount at defence so it will probably end up different compared to a smaller player facing this action.
by rolenthedeep on 11/4/22, 12:05 AM
Individually, each frame is protected by the copyright of the movie it belongs to. But what happens if you take a million frames from a million different movies and just arrange them in a new way?
That's the core question here. Is the new movie a new copyrightable work, or is it plagiarizing a million other works at once? Is it legal to use copyrighted works in this way?
The other question is if it is right to use copyrighted works this way. Is this within the spirit of open source software? Or is this just a bad corporation taking advantage of your good will?
I'm not sure where I stand on this, it's a complicated problem for sure. Definitely interested to see how this plays out in court.
by poulpy123 on 11/4/22, 12:37 PM
I don't know about the US laws in copyright so I can't comment on the legal documents but this website is not complaining that copilot is reproducing copyrighted content but it was trained on copyrighted content. I don't see how you can forbid someone or something to read and learn from something that is public (once again producing is another problem)
by throwaway675309 on 11/3/22, 11:17 PM
For example let's say I'll take a single frame of animation from a cartoon, The frame contains a mountain, house, and a couple characters although those characters are not integral to the actual cartoon maybe they're extras (villagers and not named characters something like Mickey Mouse for example)
I draw a picture of a lake with a cabin next to it, then start to draw a frontiersman but I trace one of his arms from a villager of that previous frame of animation... Number one am I in danger of copyright infringement (have I hit some arbitrary threshold), and number two: am I causing monetary losses for the cartoon?
by jasonladuke0311 on 11/4/22, 12:40 AM
by RamblingCTO on 11/3/22, 8:57 PM
If I'm being honest I'm a bit annoyed at this. What's the problem and what's the point of this?
by renewiltord on 11/3/22, 9:19 PM
by mezbot on 11/4/22, 3:35 AM
by EMIRELADERO on 11/3/22, 9:20 PM
The crux of the lawsuit's argument is that the AI unlawfully outputs copyrighted material. This is evident in many tests with many people here and on Twitter even getting verbatim comments out of it.
AI art, in the other hand, is not capable of outputting the images from its training set, as it's not a collage-maker, but an artificial brain with a paintbrush and virtual hand.
by fancyfredbot on 11/3/22, 9:07 PM
by hjroberts on 11/3/22, 11:32 PM
robots.txt
This is exactly what is needed for source code, and the default (no robots.txt) should be "disallow".The fact that the Web has considered this moral issue should be a strong hint for the AI people not to take a purely legal stance but consider the OSS community that they are so heavily using.
by atum47 on 11/3/22, 9:34 PM
by datacruncher01 on 11/4/22, 2:51 AM
by cothrowaway88 on 11/3/22, 9:08 PM
I'm 1000% on team open source and have had to refer to things like tldrlegal.com many times to make sure I get all my software licensing puzzle pieces right. Totally get the argument for why this litigation exists in the present.
Just saying in general my friends I hope you have an absolutely great day. Someone will be wrong on the internet tomorrow, no doubt about it. Worry about something productive instead.
This one has the feel of being nothing more than tilting at windmills in the long run.
by 0cf8612b2e1e on 11/3/22, 9:00 PM
by matthewwolfe on 11/4/22, 2:19 AM
by pmarreck on 11/3/22, 10:17 PM
by User23 on 11/4/22, 1:56 AM
by stovenctl on 11/3/22, 11:08 PM
Sometimes the query is the first half of a small statement that we can fill in with common patterns. Useful, fair.
Sometimes the query is a signature like `fn fast_inv_sqrt` that copies someone's code and doesn't attribute it.
by nuc1e0n on 11/4/22, 6:29 PM
by jjgon1781 on 11/4/22, 5:42 AM
by scoot on 11/4/22, 12:38 AM
A better shortening if the original title is simple "We’ve filed a lawsuit challenging GitHub Copilot"
by reachableceo on 11/4/22, 1:01 AM
Grand theft , interstate wire fraud and conspiracy for same.
This is a criminal matter as well as civil. Intentional and knowing violation of the law.
We must not let our work be taken!
by gcau on 11/3/22, 11:44 PM
by vlovich123 on 11/4/22, 4:30 AM
by eurasiantiger on 11/3/22, 9:08 PM
by thesuperbigfrog on 11/3/22, 8:56 PM
Can the generated code be traced back to the code used for training and the original copyrights and licenses for that code?
If so, what attribution(s) and license(s) should apply to the generated code?
by arpowers on 11/3/22, 9:09 PM
Seems to me the underlying data should be opt-in from creators and licenses should be developed that take AI into consideratiin.
by Aeolun on 11/4/22, 12:40 AM
by Yahivin on 11/3/22, 9:12 PM
Start off a comment with // MIT license
Then watch parts of various software licenses come out including authors' names and copyrights!
by marmada on 11/4/22, 1:53 AM
by amelius on 11/3/22, 11:47 PM
(asking because I know the authors were kinda famous for being very litigious).
by HeavyStorm on 11/4/22, 10:19 AM
by sensanaty on 11/3/22, 10:57 PM
by clusterhacks on 11/3/22, 9:19 PM
by machiste77 on 11/3/22, 9:12 PM
by kgarten on 11/4/22, 2:29 AM
all the best with the lawsuit.
by barelysapient on 11/3/22, 9:35 PM
by i_like_apis on 11/4/22, 2:17 AM
by SighMagi on 11/3/22, 9:29 PM
by SurgeArrest on 11/3/22, 9:37 PM
If these folks win - we again throw progress under the bus.
by ISL on 11/3/22, 9:00 PM
function force=Gmmr2Array(mass1, mass2)
and function [force, torque]=pointMatrixGravity(array1,array2)
?I'd love to know if some of my GPL v3 code [1, 2] has landed in the training set
[1] https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/...
[2] https://github.com/4kbt/NewtonianEotWashToolkit/blob/master/...
by m00x on 11/3/22, 9:03 PM
This person (a lawyer) saw an opportunity to make money and jumped on it like a hungry tiger on fresh meat.
by Entinel on 11/3/22, 9:16 PM
by VoodooJuJu on 11/3/22, 9:55 PM
But I like to put on my conspiracy hat from time to time, and right now is one such time, so let's begin...
Though the motivations behind this case are uncertain, what is certain is that this case will establish a precedent. As we know, precedents are very important for any further rulings on cases of a similar nature.
Could it be the case that Microsoft has a hand in this, in trying to preempt a precedent that favors Copilot in any further litigation against it?
Wouldn't put it past a company like Microsoft.
Just a wild thought I had.
by bugfix-66 on 11/3/22, 9:00 PM
The No-AI 3-Clause Open Source Software License
Copyright (C) <YEAR> <COPYRIGHT HOLDER>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Use in source or binary forms for the construction or operation
of predictive software generation systems is prohibited.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
https://bugfix-66.com/f0bb8770d4b89844d51588f57089ae5233bf67...by 60secs on 11/3/22, 9:22 PM