by SethMLarson on 3/20/25, 2:58 PM with 115 comments
by cxr on 3/20/25, 4:04 PM
<https://thelibre.news/foss-infrastructure-is-under-attack-by...>
394 comments. 645 points. Submitted 3 hours ago: <https://news.ycombinator.com/item?id=43422413>
by hugs on 3/20/25, 4:42 PM
"L402" is an interesting proposal. Paying a fraction of a penny per request. https://github.com/l402-protocol/l402
by fewsats on 3/20/25, 5:12 PM
It seems like a good fit for micropayments. They never took off with people but machines may be better suited for them.
L402 can help here.
by Aurornis on 3/20/25, 3:56 PM
> This practice started with larger websites, ones that already had protection from malicious usage like denial-of-service and abuse in the form of services like Cloudflare or Fastly
FYI Cloudflare has a very usable free tier that’s easy to set up. It’s not limited to large websites.
by parliament32 on 3/20/25, 4:47 PM
Looks like the GNOME Gitlab instance implements it: https://gitlab.gnome.org/GNOME
by hubraumhugo on 3/20/25, 4:12 PM
Good bots: search engine crawlers that help users find relevant information. These bots have been around since the early days of the internet and generally follow established best practices like robots.txt and rate limits. AI agents like OpenAI's Operator or Anthopic's Computer Use probably also fit into that bucket as they are offering useful automation without negative side effects.
Bad bots: bots that have a negative affect website owners by causing higher costs, spam, or downtime (automated account creation, ad fraud, or DDoS). AI crawlers fit into that bucket as they disregard robots.txt and spoof user agent. They are creating a lot of headaches for developers responsible for maintaining heavily crawled sites. AI companies don't seem to care about any crawling best practices that the industry has developed over the past two decades.
So the actual question is how good bots and humans can coexist on the web while we protect websites against abusive AI crawlers. It currently feels like an arms race without a winner.
by kmeisthax on 3/20/25, 5:02 PM
Mastodon has AUTHORIZED_FETCH and DISALLOW_UNAUTHENTICATED_API_ACCESS which would at least stop these very naive scrapers from getting any data. Smarter scrapers could actually pretend to speak enough ActivityPub to scrape servers, though.
by jmclnx on 3/20/25, 3:52 PM
Sad things are getting to this point. Maybe I should add this to my site :)
(c) Copyright (my email), if used for any form of LLM processing, you must contact me and pay 1000USD per word from my site for each use.
by charcircuit on 3/20/25, 4:18 PM
The amount of spam that happens when you let people freely post is a much bigger problem.
by renegat0x0 on 3/20/25, 6:35 PM
Most of content, blogs could be static sites.
For mastodon, forums I think user validation is ok and a good way to go.
by 0x1ceb00da on 3/20/25, 4:08 PM
by MontgomeryPy on 3/20/25, 4:39 PM
by napolux on 3/20/25, 4:14 PM
This is scary
by anovikov on 3/20/25, 4:41 PM
by woah on 3/20/25, 5:11 PM
by isoprophlex on 3/20/25, 4:25 PM
Deregulation is ultimately antithetical to our personal freedom.
I just hope the spirit of the internet that I grew up with can be rescued, or reincarnated somehow...
by ToucanLoucan on 3/20/25, 3:53 PM
Move fast and break things apparently has a bonus clause for the things you break not being your responsibility to fix.
by JKCalhoun on 3/20/25, 3:57 PM
It's a not a binary thing to me: LLMs are not god, but even without AGI, they have proven wildly useful to me. Calling them "shitty chat bots" doesn't sway me.
Further I have always assumed that everything that I post to the web is publicly accessible to everyone/everything. We lost any battle we thought we could wage some 2+ decades ago when web crawlers started hoovering up data from our sites.