by specto on 8/11/23, 9:48 PM with 34 comments
by Meekro on 8/11/23, 11:49 PM
by vouaobrasil on 8/12/23, 1:05 AM
> For example, blocking content from future AI models could decrease a site's or a brand's cultural footprint if AI chatbots become a primary user interface in the future.
I would rather leave the internet entirely if AI chatbots become a primary user interface.
by 8organicbits on 8/11/23, 10:38 PM
by JohnFen on 8/11/23, 10:29 PM
This is why I'm not reassured. robots.txt isn't sufficient to stop all webcrawlers, so there every reason to think it isn't sufficient to stop AI scrapers.
I'm still wanting to find a good solution to this problem so that I can open my sites up to the public again.
by wildpeaks on 8/12/23, 11:40 AM
It's more pragmatic to expect that any data that can be accessed one way or another will be scraped because interests aren't aligned between content authors and scrapers.
On the other hand, robots.txt was benefiting both search engines and content authors because it signaled data that wasn't useful to show in search results, therefore search engines had an incentive to follow its rules.
by blibble on 8/11/23, 10:53 PM
there is zero benefit to me in allowing OpenAI to absorb my content
it is a parasite, plain and simple (as is GitHub Copilot)
and I'll be hooking in the procedurally generated garbage pages for it soon!
by karaterobot on 8/11/23, 10:52 PM
by askvictor on 8/11/23, 11:31 PM
by CableNinja on 8/12/23, 7:04 AM
Instead, use a redirect or return a response code by doing a user agent check in your server config. I posted elsewhere in this thread on the way i did it with nginx