from Hacker News

How to Bypass Cloudflare: A Comprehensive Guide

by jakobdabo on 9/18/22, 11:59 AM with 74 comments

  • by cj on 9/18/22, 5:13 PM

    There are legitimate use cases for bypassing cloudflare's bot protection.

    I discovered our company's help documentation (and integration guides), hosted by readme.com, were completely de-indexed from Google for the past 3 months.

    Our Readme docs were formerly our #1 source of organic (free) leads.

    After investigating, Cloudflare (as configured by Readme) was blocking Googlebot when using Cloudflare Workers. Cloudflare was returning a 403 for Googlebot, but returning pages as usual for regular users.

    The cause: we were using Workers to rewrite some URLs at the edge (replacing Readme's default images with optimized + compressed images, using Cloudflare's own image optimization service).

    By using Workers to do this, it resulted in Readme's Cloudflare account receiving requests from our domain with "googlebot" useragent, but from an IP that wasn't verified as a googlebot IP address (I assume the Worker was requesting the Readme site using the Googlebot user agent but with whatever IP address is used when using CF Workers).

    I emailed Cloudflare support but it was clear it would take a lot of time to get them to understand the issue (and probably longer to fix it).

    So, we had to spend a lot of time figuring out how to allow Googlebot requests past Cloudflare's "fake bot" firewall rule.

    In our own Cloudflare account, we have all security settings at the lowest sensitivity possible (or turned off completely). We serve over 500 billion requests a month (10+ TB of bandwidth), and the amount of blocked traffic to seemingly legitimate clients was surprisingly high.

    I love Cloudflare (and own quite a bit of their stock) but I'm beginning to rethink my stance on their service. They make it extremely easy to enable powerful features with little visibility or control over the details of how those features work.

    Another SEO nightmare is their "Crawler Hints" service. I highly recommend no one uses this if you are ever the target of automated security scanners (e.g. ones used by bug bounty white hat hackers). With "crawler hints" enabled and with a white hat hacker running a scan of your site hitting random URLs... results in bingbot, yandex, and other search engines attempting to index every single one of the URLs hit by the security scanners used by hackers.

    Basically, it's a mess, and the only way to really fix it is to bypass cloudflare or spend a lot of time and money with Cloudflare debugging.

    Next quarter I'm faced with the decision of either doubling down of Cloudflare and getting an Enterprise plan with them ($20k+) or just ripping them out of our stack and going back to our old AWS Cloudfront set up which has fewer POPs, but was much less of a hassle.

  • by Tiberium on 9/18/22, 2:57 PM

    The actual "easiest" way (at least for me) to bypass Cloudflare is to find the actual IP of the web-server running behind it. Of course in a lot of cases it's not possible, for example when the web admin correctly limits the webserver to only respond to Cloudflare IP ranges, or if https://developers.cloudflare.com/ssl/origin-configuration/o... is used.

    Most useful services for that are https://shodan.io/ and https://search.censys.io/. I've had decent successes with Censys on finding real IP addresses of websites behind Cloudflare. Of course you might also have success by checking history of DNS records for a particular domain.

  • by Anunayj on 9/18/22, 5:52 PM

    I would also like to mention FlareSolverr [1] here, which just uses a headless browser to solve the challenges, which might be acceptable in some situations (that don't need high request rate)

    1. https://github.com/FlareSolverr/FlareSolverr

  • by dizhn on 9/18/22, 2:58 PM

    Use zenrows. Got it. It's clickbait but it does provide a good summary of how cloudflare's anti bot stuff works.
  • by yjftsjthsd-h on 9/18/22, 4:04 PM

    The frustrating thing to me is that CF is that invasive and still can't distinguish bots from people; it usually eventually lets me through, but I've spent enough time staring at the "are you sure you're not a not?" screen to laugh off their claims about human/not traffic ratios.
  • by Ralo on 9/19/22, 12:00 AM

    In the past I've always found that the easiest way to bypass Cloudflare was looking up DNS history of their domain. Majority of servers will continue to respond off their IP directly.
  • by urtom on 9/18/22, 7:15 PM

    If I just need to make plain GET requests in my web scraping, I've found the easiest way to bypass Cloudflare on most sites is to make the requests via the Internet Archive. That has some rate limiting, but it can be worked around by using several source IP addresses in parallel.
  • by alokjnv10 on 9/18/22, 5:01 PM

    I hate cloudflare. I had a really hard time making a web scraper.
  • by IceWreck on 9/19/22, 5:11 AM

  • by 1vuio0pswjnm7 on 9/19/22, 1:28 AM

    Correct me if I have missed something but this elaborate fingerprinting exercise called "bot protection" cannot distinguish conclusively whether a person is giving commands to a computer in real-time or from if the computer is reading from a script of commands. It only serves to distinguish what OS, client, IP address, etc. It is collecting tracking data.

    Of course, those trying to profit from online advertising services seek to collect the same (fingerprinting) data. Do Cloudflare terms of service/privacy policy allow Cloudflare to do anything they want with this data, or are there limits.

  • by userbinator on 9/18/22, 11:42 PM

    If you use even a slightly customised user-agent and/or OS setup, it's likely you'll be blocked. They'll of course say it's for your "security", but we all know what that really means: use only the software and hardware configurations that we approve. Of all the ways to "herd the sheeple", this is the most insidious because it punishes those who don't want to submit to their whims. Meanwhile the actual attackers are going to still have enough motivation to find ways around it, similar to how DRM has encouraged piracy.
  • by donutshop on 9/18/22, 3:49 PM

    Are there other products out there that offers a similar feature set at this price point?
  • by nothasan on 9/18/22, 3:31 PM

    Some impressive documentation on how to get around this BM solution.
  • by midislack on 9/18/22, 7:41 PM

    All this and it's just an ad for some SAAS? Fuck I got gypped.