by santah on 8/14/24, 5:36 PM with 17 comments
https://next-episode.net/img/upload/5xx.png
On the website everything looked good and all reported links worked fine.
I tried validating these errors at GSC as fixed, but it would always report back that the issue is still present and new links would keep popping up to have 5xx errors (as seen on the screenshot).
This was worrisome because it indicated there was some kind of an issue I wasn’t aware of that may be affecting not only Google’s crawlers, but my users as well.
I did what everyone would do - checked my server, Cloudflare and analytics logs for anything suspicious and placed some additional logging to try and catch what was happening.
This turned out nothing - as far as I could tell - no requests returned any 5xx errors, so I decided it’s just a weird Google quirk and ignored it for a while.
With time though, Google kept reporting these problems and the count of 5xx URLs only grew larger so once again (about 2 weeks ago) I started investigating what was happening.
This time around, I tried to match the URLs reported by GSC with the analytics provided by Cloudflare and bingo - I found that all these requests had the Edge Status Code (and Origin Status Code) of “429 Too Many Requests”.
Now that was progress.
There is only one thing on my service that would return this status code and is my custom rate limiting which would be triggered if you do more than 30 requests in less than 10 seconds.
What changed so that Google suddenly decided to crawl so aggressively and hit that limit (something that never happened before, and Next Episode is online for more than 19 years now!) and why it’s reporting them 5xx in GSC when my server clearly returns 429 - I don’t know.
What I do know for sure is that Google is misreporting 429 server status as 5xx.
To fix this (at least as a quick fix for now) - I whitelisted in my rate limiter all Google Crawlers’ IPs (which I found through here: https://www.infidigit.com/blog/google-update-googlebots-ip-a... ) - listed in this JSON provided by Google: https://developers.google.com/search/apis/ipranges/googlebot...
For just in case, I also passed on the ASN in the request header (through a Cloudflare transform rule) and whitelisted the whole Google ASN (15169) as well.
After - I monitored for new 5xx errors popping up in GSC and new 429 statuses logged in Cloudflare from Google’s ASN and so far (for more than 2 weeks) - so good.
by kevin_nisbet on 8/14/24, 9:25 PM
Not trying to criticize if this was already checked. Just something I’d try to double check out of being overly cautious.
by theginger on 8/19/24, 7:40 AM
by agpl3141592 on 8/14/24, 8:53 PM
It should return 429 so Google can reduce the requests.
I'm not even sure why you would rate limit in the first place? IPS are not unique. One company gateway or university for example has plenty of users.
Rate limit requests from users you know and make sure every public API is properly cached.
by mattgreenrocks on 8/14/24, 9:20 PM
I'll take a look for 429s. Cheers.