from Hacker News

FSF-calls for white papers on philosophical and legal questions around Copilot

by non_sequitur on 7/29/21, 4:09 PM with 198 comments

by davisr on 7/29/21, 5:11 PM
The ignorance in this comment section is already giving me an aneurysm. Software licenses matter. Copyright matters. If megacorps like Microsoft can sue people into oblivion for violating their copyright terms, people can sue Microsoft into oblivion for violating theirs. I don't use MS Github, I have no skin in the game, but I hope there is at-least a $1000 award to every instance of AGPL and GPL license violation because it's unfair and illegal what they're doing.
This isn't ML, it is a ripoff and is violating clear software licensing terms. https://news.ycombinator.com/item?id=27710287
Software freedom matters, but I wouldn't expect the typical HN type to understand, since their money is made on exploiting freely-available software, putting it into proprietary little SaaS boxes, then re-selling it.
by ralph84 on 7/29/21, 7:47 PM
Their link to why you shouldn't use GitHub[0] takes you to a page where they criticize GitHub for complying with US export controls. The FSF is a US corporation, why do they think that US export controls don't equally apply to savannah.gnu.org? And unlike FSF, GitHub has actually done the work of applying for export licenses so that developers in US-sanctioned countries can access GitHub[1].
[0] https://www.gnu.org/software/repo-criteria-evaluation.html#G... [1] https://github.blog/2021-01-05-advancing-developer-freedom-g...
by lamontcg on 7/29/21, 7:19 PM
Given how the racist twitterbot AI turned out, along with L4 autonomous driving by 2017, I suspect that Copilot is going to suffer most from an incredibly high velocity of churned out security bugs and bad code. SWEs are probably going to get fired for using it and companies will need to ban it, even if the legal problems don't take it down.
by belorn on 7/29/21, 5:06 PM
An interesting initiative from FSF, through I suspect the answer the most of the question will be answered when someone attempts a similar projects in a more traditional copyright-restrictive area.
As an example I would like to see is a Cosinger, where the AI is trained using songs on youtube and streaming services. With the final product, a user start to sing and the algorithm attempt to sing along and give the singer suggestions for how the song should continue. I could see how a lot of musicians would be willing to pay good money for such program, and removing obligations to pay any money for the training set would make it much more feasible to create.
There are already AI's that create music (through unlikely from proprietary training sets). A Cosinger shouldn't be too far from that.
by hartator on 7/29/21, 6:33 PM
> We already know that Copilot as it stands is unacceptable and unjust, from our perspective.
So, why call for white papers? I don’t believe they will publish any papers that go against their views.
by whazor on 7/29/21, 10:30 PM
I am curious about the results.
Having tested copilot, most suggestions are based on existing code in your opened file. Furthermore, most snippets tend to be relatively short, where it feels more like a Stack Overflow answer than existing code.
Of course it is possible to make the model generate longer pieces of code that are potentially GPL. But you would have to do certain effort for it. It also tends to adopt your coding style.
But maybe the fact that there are no guarantees makes it unfair.
by thomzane on 7/29/21, 4:48 PM
I am excited to see where these questions lead.
by MichaelMoser123 on 7/30/21, 6:27 AM
i actually like it that copilot is better than me at solving interview questions. https://www.youtube.com/watch?v=FHwnrYm0mNc I for one welcome our robot overlords.
i wonder if they could retrain the model on BSD or MIT licensed code only; How much of the open source code is licensed as GPL vs more permissive licenses, does anyone know?
Interesting that they want to charge for the use of co-pilot, I guess that we will see this business model more in the future.
by 6510 on 7/30/21, 1:41 AM
My opinion: Copilot is a derived work.
by lights0123 on 7/29/21, 7:22 PM
> It requires running software that is not free/libre (Visual Studio, or parts of Visual Studio Code)
A little nitpicky, but the only proprietary part it requires is the plugin itself, not the IDE—Copilot runs just fine with the Free build of VS Code compiled from source from GitHub, after flipping a switch to enable WIP APIs.
by zekrioca on 7/29/21, 6:20 PM
Interesting: In HN, a same link submitted at a different time get different # of upvotes.
Same link, just 13h ago, but with 5x less upvotes than the one in here: https://news.ycombinator.com/item?id=27992894
by kmeisthax on 7/30/21, 2:14 AM
>Is Copilot's training on public repositories infringing copyright? Is it fair use?
My money's on yes, but this isn't settled until SCOTUS says so.
>How likely is the output of Copilot to generate actionable claims of violations on GPL-licensed works?
This depends on how likely Copilot is to regurgitate it's training input instead of generate new code. If it only does so IF you specifically ask it to (e.g. by adding Quake source comments to deliberately get Quake input), then the likelihood of innocent users - i.e. people trying to write new programs and not just launder source code - infringing copyright is also low. However, if Copilot tends to spit out substantially similar output for unrelated inputs, then this goes up by a lot. This will require an actual investigation into the statistical properties of Copilot output, something you won't really be able to do without unrestricted access to both the Copilot model and it's training corpus.
>How can developers ensure that any code to which they hold the copyright is protected against violations generated by Copilot?
I'm going to remove the phrase "against violations generated by Copilot" as it's immaterial to the question. Copilot infringement isn't any different from, say, a developer copypasting a function or two from a GPL library.
The answer to that, is that unless the infringement is obvious, it's likely to go unpunished. Content ID systems (which, AFAIK, don't really exist for software) only do "striking similarity" analysis; but the standard for copyright infringement in the US is actually lower: if you can prove access, then you only have to prove "substantial similarity". This standard is intended to deal with people who copy things and then change them up a bit so the judge doesn't notice. There is no way to automate such a check, especially not on proprietary software with only DRM-laden binaries available.
If you have source code, then perhaps you can find some similar parts. Indeed, this is what SCO tried to do to the Linux kernel and IBM AIX; and it turned out that the "copied" code was from far older sources that were liberally licensed. (Also, SCO didn't actually own UNIX.) Oracle also tried doing this to the Java classpath in Android and got smacked down by the Supreme Court. Having the source open makes it easier to investigate; but generally speaking, you need some level of suspicion in order to make it economic to investigate copyright infringement in software.
Occasionally, however, someone's copying will be so hilariously blatant that you'll actually find it. This usually happens with emulators, because it's difficult to actually hire for reverse engineering talent and most platform documentation is confidential. Maui X-Stream plagiarized and infringed PearPC (a PowerPC Macintosh emulator) to produce "CherryOS"; Atari ported old Humongous Entertainment titles to the Wii by copying ScummVM; and several Hyperkin clone consoles feature improperly licensed SNES emulation code. In every case, the copying was obvious to anyone with five minutes and a strings binary, simply because the scope of copied code was so massive.
>Is there a way for developers using Copilot to comply with free software licenses like the GPL?
Yes - don't use it.
I know I just said you can probably get away with stealing small snippets of code. However, if your actual intent is to comply with the GPL, you should just copy, modify, and/or fork a GPL library and be honest about it.
To add onto the FSF's usual complaints about software-as-a-service and GitHub following US export laws (which, BTW, the FSF also has to do, unless Stallman plans to literally martyr himself for--- oh god he'd actually do that); I'd argue that Copilot is unethical to use regardless of concerns over plagiarism or copyright infringement. You have no guarantee that the code you're actually writing actually works as intended, and several people have already been able to get Copilot to hilariously fail on even basic security-relevant tasks. Copilot is an autocomplete system, it doesn't have the context of what your codebase looks like. There are way better autocomplete systems that already exist in both Free and non-Free code that don't require a constant Internet connection to a Microsoft server.
>Should ethical advocacy organizations like the FSF argue for change in copyright law relevant to these questions?
I'm going to say no, because copyright law is already insane as-is and we don't need to make it worse just so that the copyleft hack still works a little better.
Please, for the love of god, we do not need stronger copyrights. We need to chain this leviathan.
by pkrefta on 7/29/21, 5:02 PM
I'm using Github to publish my code and seriously I don't care whenever Copilot was trained using it. I published it and in the end somebody can do anything with it without giving a damn about license, copyright etc - that's the truth of open-source.
by senko on 7/29/21, 5:00 PM
> We already know that Copilot as it stands is unacceptable and unjust [...]. Activists wonder if there isn't something fundamentally unfair about a proprietary software company building a service off their work.
> We will read the submitted white papers, and we will publish ones that we think help elucidate the problem.
Doesn't give me hope they're aiming for unbiased opinion. I would be very surprised if any of the published papers don't closely align with FSFs apriori position.
by ghoward on 7/29/21, 7:34 PM
I honestly wish I was in a position to write a whitepaper for this. However, I should not for several reasons:
* I have already made my position clear in public, [1] so I could probably be identified.
* I am not a lawyer, just some bloke who attempted to write FOSS licenses to combat ML on copyrighted code. [2]
[1]: https://gavinhoward.com/2021/07/poisoning-github-copilot-and...
[2]: https://yzena.com/licenses/
by slownews45 on 7/29/21, 4:58 PM
Anyone feel like FSF moved from maybe engineering idealists to a very lawyer driven type org?
The big GPLv3 push and development - plenty of attacks on folks actually shipping product on GPLv2 and building communities around that model (which keeps software free but allows users of the software to do what they want with it pretty much including putting in devices that are locked down - cars / tivo's etc).
Here's an opportunity to really advance in an interesting area with ML -> something that may open up programming to more people -> may advance computers ability to program and modify their own programs in the long run.
And regardless of the FSF attorney stuff, places like china, tiny little LLC's with no assets will very likely use the wonderful amount of code on the web to develop solutions in this space, even if FSF claims everything is a violation. Where is the vision anymore from FSF.
One thing that's been sad about the FSF -> it's gone from what I would consider a forward looking idealism sort of thing -> here's how we could do / make cool stuff that let communities work together -> to now sort of a legal compliance type org that really is focused on "actionable claims" " protected against violations" etc.
Question - does the Linux community and other successful larger open source communities welcome the FSF and their attorney's into the discussion? I can hardly imagine the BSD's, the Linux folks really connecting anymore with them.
Is there space for a different group, maybe a collection of actual develops shipping code in larger communities to get together, no FSF / SFC lawyers present, to think creatively about the future? What should we be working for, what is fair to everyone, what helps society, what works around pro-social community building?
A tool that helps with cross language building blocks for common functions etc (stackoverflow on steroids) - just how bad is this?