by ethor on 10/10/22, 5:22 PM with 29 comments
What have you used that has been effective?
by freedomben on 10/10/22, 7:12 PM
After that, you can throw a CAPTCHA on pages (particularly submission pages), but that will harm legitimate users as well as bots.
Make sure your origin server is only reachable from Cloudflare. If people can hit it directly, then they bypass Cloudflare. If you use firewalld, I wrote this in my setup script that you can use:
for range in $(curl -s -X GET "https://api.cloudflare.com/client/v4/ips" | jq -r '.result.ipv4_cidrs[]'); do
for port in 80 443; do
echo "Inserting firewalld rule for address range '${range}' on port '${port}'"
firewall-cmd --zone=public --permanent \
--add-rich-rule="rule family=\"ipv4\" source address=\"${range}\" port protocol=\"tcp\" port=\"${port}\" accept"
done
done
firewall-cmd --remove-service=http --permanent
firewall-cmd --remove-service=https --permanent
firewall-cmd --reload
by tothrowaway on 10/11/22, 8:14 AM
It has the nice side effect of protecting you from run-of-the-mill DDoS attacks too.
(I realize half my comments here are about OpenResty, but I have no affiliation with them. I'm just a happy user.)
by clafferty on 10/10/22, 9:12 PM
1. Block all unverified bots with a bot score of 1. This will still allow popular web crawlers but could be strict enough to block a curl request.
2. Use Manage Challenge for unverified bots with a bot score less than 30. This will silence most of the trouble making bots and provides a JavaScript (not necessarily Captcha) solution for users who are incorrectly scored.
3. Add rate limiting. Figure out a realistic access rate, double it and use that as a hard limit that will block traffic for an hour or day depending on your needs.
4. Add more sensitive rate limits and play with manage challenge rules. Use the simulate option before enabling any rate limits. You can add challenges here if you feel a limit might be affecting users too. Simulate for a few days before enabling
5. Review rate limits and firewall reports regularly and adjust. With any Managed Challenge rules make sure to check the percentage completed to see if you’re trapping real users. This number should be as close to 0 as possible. Repeat step 4.
You’ll want to get around your own blocking rules with some complimentary whitelisting rules.
Although it’s advised to lock down your origin server to prevent non Cloudflare traffic hitting your server you might not be able to do so easily, if you’ve got load balancers and other infra in your way that can’t be touched. Just make sure your root domain isn’t leaking your www IP address. You can use CNAME flattening and you should be alright.
The difficulty in these solutions is managing all the rules you can make. Things can quickly become too complicated to make changes easily. Keep it simple, have a few basic but aggressive blocking rules and revise your whitelist and rate limits regularly. Good luck
by codegeek on 10/10/22, 5:49 PM
- Setup captcha or just block users from certain countries if you know where your traffic comes from. This can sometimes create issues for your users on VPN but then you have to make the call depending on how many of your users may be using VPN etc. At the minimum, add a captcha.
- Create more Page rules in cloudflare and block if they don't match the rule. For example, if your URLs start with a specific prefix, drop anything that is a no match.
- Make sure to return 444 status from your server directly if bots are bypassing cloudflare and hitting the IP directly. Sample code for nginx 1.19 or higher:
server {
listen 80 default_server;
listen [::]:80 default_server;
listen 443 default_server;
listen [::]:443 default_server;
ssl_reject_handshake on;
server_name _;
return 444;
}
If bots are getting too aggressive, I start with Block first, ask questions later. Depending on your traffic and users, it may be the right strategy.by andrewmcwatters on 10/10/22, 9:09 PM
by viraptor on 10/10/22, 8:20 PM
1. Why do you want to stop bots? Are they actually overloading your resources, or are they just noisy in the logs. If you can easily handle the traffic, maybe find a way to filter the logs better.
2. How do you know they're bots? If they're easy to identify, can you write a few simple rules to remove most of them?
2a. Are they mindless scans? Make sure your app doesn't even see requests to resources which don't exist.
2b. Are they scraping content? Set up per-resource-per-IP rate limits (token bucket style)
2c. Are they coming from a specific network, for example tor, AWS, or similar? Put in an auto updating list of sources that get dropped at firewall level.
3. As mentioned in other comments, if you're using some proxy in front of your service, ensure you drop any traffic which bypasses is.
Basically consider what's actually happening and respond to that. There's no setting that will improve things without side effects, or it would be already turned on.
by trinovantes on 10/11/22, 1:19 AM
https://github.com/brianhama/bad-asn-list
Unfortunately you'll also alienate VPN users so you'll have to decide if it's worth the cost
by ianpurton on 10/10/22, 5:58 PM
I block everything else
That kills most of it.
by JimWestergren on 10/10/22, 5:40 PM
by NetToolKit on 10/10/22, 11:12 PM
Essentially, Gatekeeper is a rules-engine with a fancy UI that allows you to craft policies specific to your site and the traffic that is visiting your site. For example, you can say "Allow Googlebot" and "Show CAPTCHA to visitors from AWS on every fifth visit". If you'd like to communicate offline, you can find our email address in my profile.
by joekok33 on 10/11/22, 7:51 AM
by _humancompiler on 10/12/22, 10:12 PM
by rpigab on 10/11/22, 7:22 AM