by marvindanig on 11/4/23, 4:47 PM with 9 comments
by jimwhite on 11/5/23, 10:58 PM
The law has been thoroughly litigated over web publishing and search engines so there is plenty of precedent to read up on if you want to understand why (short of a huge and super unlikely change to the laws) what you want can't (and shouldn't - the US Constitution created copyright in order to incentivize creators to publish their works instead of keeping them under lock and key) just search for things like [copyright and web search engines]:
https://www.google.com/search?q=copyright+and+web+search+eng...
If you want limitations that aren't implemented in copyright law then you'll need to only share your content to others privately and under a contract they've agreed to.
by sargstuff on 11/5/23, 4:10 PM
Historically, Sneakernet / Physical/limited access personal library / "not for distribution outside company" was the way.
Simplest way would be to have private / internal network with no outside internet access. This doesn't prevent sneakernet ports to machines with outside access. Nor does this prevent an LLM on usb stick from 'scanning' and/or unintentional 'picture uploads'.
How would one identify LLM scanning from non-LLM scanning (beyond 10,000,000 requests per second from single source)? Checking a sites robot.txt is on honor system. And similar related things where there is a specific way to idenify valid/invalid access, such as fail-to-ban, are a never ending battle of being updated/revised to remain current.
License or no license, sort of a different take on turning test of can an ai fool a human into believing ai is a human[0]. capture system[1] to verify not a bot example of this.
[0] : https://en.wikipedia.org/wiki/Turing_test
[1] : https://en.wikipedia.org/wiki/CAPTCHA
by az09mugen on 11/4/23, 10:43 PM
by sprobertson on 11/5/23, 2:25 AM