from Hacker News

Fastly Outage

by pcr0 on 6/8/21, 9:57 AM with 694 comments

by lpmitchell on 6/8/21, 10:00 AM
This seems to be impacting a number of huge sites, including the UK government website[0].
[0] https://www.gov.uk/
https://m.media-amazon.com/
https://pages.github.com/
https://www.paypal.com/
https://stackoverflow.com/
https://nytimes.com/
Edit:
Fastly's incident report status page: https://status.fastly.com/incidents/vpk0ssybt3bj
by austinjp on 6/8/21, 10:18 AM
Yeah so it's been mentioned in the comments already, but to everyone in Fastly right now: I feel for you. Something like this must be insanely stressful, and not just during the outage. There will be (should be) a massive post-mortem. People will be losing sleep over this for days, weeks, months.
:(
Edit: There seems to be a major empathy outage in this thread. Disgusted but not surprised, unfortunately.
by iso1631 on 6/8/21, 10:05 AM
https://easydns.com/blog/2020/07/20/turns-out-half-the-inter...
The whole idea of the internet was a distributed network impervious to most attacks.
The reality is that a single failure can knock out 90% of the services people use.
by mrzool on 6/8/21, 10:04 AM
Why is this a link to the Fastly homepage, where absolutely no information is provided?
This is the page that should be linked:
https://status.fastly.com
by barosl on 6/8/21, 10:04 AM
I didn't know so many sites were depending on Fastly. Stack Overflow, GitHub, reddit, .... Even pip is unavailable. My development workflow is completely janked up. It is a bit scary that we are putting too many eggs in one basket.
by csmattryder on 6/8/21, 10:00 AM
Here's the status page incident for this.
https://status.fastly.com/incidents/vpk0ssybt3bj
by optiomal_isgood on 6/8/21, 10:23 AM
Amazon.com was completely broken here (Europe) and they're back, I was observing from where the assets were loaded from and they switched from EU to NA as a failover. Homework well done.
by creamyhorror on 6/8/21, 10:00 AM
basically the internet is down
reddit, stackoverflow, github, paypal, pypi, twitter, twitch, NYT, CNN, BBC, the Guardian...
edit: wow, even Amazon.com relies on Fastly for some of its edge caches!
by atymic on 6/8/21, 10:26 AM
This has got to be even bigger than when cloudflare went offline, in terms of big companies affected. Clearly they have way more F500 customers than CF.
Good luck to the on call engineers!
by omk on 6/8/21, 1:02 PM
This outage made me realize that github is served over a single IP address (A record) for my point of origin (India). Stackoverflow has 4 A record listing, but all of these belong to fastly.
The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ... <confused>.
by k_ on 6/8/21, 10:50 AM
Update: The issue has been identified and a fix is being implemented. Posted Jun 08, 2021 - 10:44 UTC
Seems like this is being resolved; curious to see the details afterwards
(from https://status.fastly.com/incidents/vpk0ssybt3bj)
by permb on 6/8/21, 10:27 AM
Made my alpine linux docker builds fail as well (varnish) - but shouldn’t it use a mirror when the primary download site is gone?
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKIN... fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/... ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.12/main: temporary error (try again later)
by ClearAndPresent on 6/8/21, 10:06 AM
What conclusions can we draw about concentrating web content in a few CDNs?
by oneeyedpigeon on 6/8/21, 10:31 AM
Good marketing for Fastly! I had no idea so much of the internet relied on it...
by threeseed on 6/8/21, 10:03 AM
Shopify's CDN is down.
Which is causing $15+ million in lost product sales for every hour of outage.
Not to mention the loss of any new customers.
by Haydos585x2 on 6/8/21, 10:05 AM
Such a huge number of sites. It seems like it's mostly US based sites and Australians are okay. Sending good vibes to whatever poor person is on support right now.
by jujodi on 6/8/21, 10:52 AM
Would be fascinating if Fastly is not be able to use GitHub, Travis, Terraform, pip, etc. to deploy their fix
by csomar on 6/8/21, 10:05 AM
So I'm wondering where in the "hundreds of servers around the world" did they exactly go wrong.
This happened with Cloudflare before too. I think we are a little too dependent on these services.
by alexchamberlain on 6/8/21, 11:20 AM
Stupid question: why didn't sites "just" fail over to their actual servers to handle the traffic, albeit slowly? I guess they won't be sized to handle the load in a lot of cases, and Fastly was responding, so DNS fail over didn't work?
by sjaak on 6/8/21, 10:42 AM
Perhaps Fastly is simply taking their commitment to reducing CO2 seriously? Three hurrays for the climate!
by snookdebook on 6/8/21, 10:43 AM
I gave it about 10 tries, and it seems a very small percentage of transactions do go through.
A decent number of tries is rejected right at the Varnish front door:
< HTTP/2 503 < server: Varnish < retry-after: 0 < date: Tue, 08 Jun 2021 10:11:41 GMT < x-varnish: 271470009 < via: 1.1 varnish < fastly-debug-path: (D cache-bma1666-BMA 1623147101) < fastly-debug-ttl: (M cache-bma1666-BMA - - -) < content-length: 450 < Service Unavailable Guru Mediation: Details: cache-bma1666-BMA 1623147101 271470009
Many more reach some backend system that just dumps "connection failure":
< HTTP/2 502 < content-type: text/plain; charset=utf-8 < content-length: 18 < connection failure
And a tiny few do get through:
< HTTP/2 200 < content-type: text/html; charset=UTF-8 < cache-control: max-age=0, must-revalidate < date: Tue, 08 Jun 2021 10:11:43 GMT < via: 1.1 varnish < vary: accept-encoding < set-cookie: ...snip... < server: snooserv < content-length: 275036 < <!doctype html><html>...snip...
by pimterry on 6/8/21, 10:14 AM
This is one of the things that excites me about IPFS: in a world of decentralized data storage, yes self-hosting and control over your data is nice and all, but serious resilience to most random infrastructure outages is a much bigger deal.
It's still early days, but I'm hopeful that it can provide a real solution to today's CDN centralization.
by aero-glide2 on 6/8/21, 10:18 AM
isitdownrightnow.com is down
by DoreenMichele on 6/8/21, 10:27 AM
I'm having intermittent Reddit issues, as one more data point.
I'm grateful for HN. I rebooted my computer. I thought it was my device and then saw this on my phone while rebooting.
by monkeydust on 6/8/21, 10:29 AM
Just occurring to me how CDNs are a major point of failure now for the internet
by unfunco on 6/8/21, 10:14 AM
Amazon being down surely points to something other than Fastly being the cause?
by Jamie9912 on 6/8/21, 9:59 AM
Yep, seems like:
Reddit BBC News Twitch.tv Twitter emoji cdn?
are all down 503 service error
by kypro on 6/8/21, 11:04 AM
Some people are claiming online that this is a cyber attack. I contract for the UK Gov and I'm hearing reports that traffic is going through the roof right now.
Anyone know if there is any legitimacy to this?
by cph-w on 6/8/21, 11:05 AM
I did not realise fastly adoption was so wide-spread. Can anyone more enlightened tell my why or have some resource on which use-cases fastly is superior to other CDNs such as CloudFlare?
by simonbarker87 on 6/8/21, 10:13 AM
how will their devs fix it if stackoverflow has gone down?!
by lysp on 6/8/21, 10:14 AM
This incident affects: Europe (Amsterdam (AMS), Dublin (DUB), Frankfurt (FRA), Frankfurt (HHN), London (LCY)), North America (Ashburn (BWI), Ashburn (DCA), Ashburn (IAD), Ashburn (WDC), Atlanta (FTY), Atlanta (PDK), Boston (BOS), Chicago (ORD), Dallas (DAL), Los Angeles (LAX)), and Asia/Pacific (Hong Kong (HKG), Tokyo (HND), Tokyo (TYO), Singapore (QPG)).
by modshatereality on 6/8/21, 7:46 PM
This post is suspiciously ranked much lower than it should be (1216 points, 9 hours ago), lower than posts with < 100 points.
by sleepyshift on 6/8/21, 9:59 AM
Looks like this has taken out Reddit at least.
by optiomal_isgood on 6/8/21, 10:42 AM
FWIW, Fastly ~8 hours ago (3am UTC) reported another incident: https://status.fastly.com/incidents/1glxxb8sf2zv and deployed a fix—either the fix made it worse or wasn't sufficient to mitigate the problem.
by marmot777 on 6/9/21, 12:55 AM
I think the honorable thing would be for them to have a statement easily findable.
So many companies sweep this sort of things under the rug if it’s only customer data that’s been breached. If they can’t sweep they have a high priced PR agency do the communicating.
I do not trust companies who handle things this way.
by ZoomStop on 6/8/21, 10:40 AM
The outage has already been added to the Fastly Wikipedia page
by choult on 6/8/21, 10:31 AM
My money is on an expired internal certificate or CA.
by dkarp on 6/8/21, 10:10 AM
Before the "Error 503 Service Unavailable" messages appeared, there were a few minutes where the error was a single line:
```
    connection failure
```
Not sure if that provides anyone here with more insight into what might have caused this!
by tommoor on 6/8/21, 10:26 AM
Hands up if you're also here after being woken up by downtime alerts on the west coast
by i386 on 6/8/21, 3:04 PM
Anyone want to talk about half the internet going out because one provider couldn’t keep their service up instead of SO jokes and feels for the engineers? the entire internet is like a stack of cards from the protocol to the economic model.
by gansai on 6/8/21, 10:26 AM
wouldn't websites have alternate CDN's managing their traffic, why should they have a single point of failure ?
I was assuming there are couple of services like Fastly and companies might have architected keeping in mind the alternatives too, I guess.
by fagnerbrack on 6/8/21, 10:04 AM
https://dashboard.stripe.com/ is down https://github.com/ is defaced
by fullstackwife on 6/8/21, 10:59 AM
No mention of outage on https://status.cloud.google.com/, and I wonder why, because apparently this is a GCP problem.
by mschuster91 on 6/8/21, 10:07 AM
Ah yes, the wonders of centralized internet infrastructure.
Let's use a handful of providers for everything, they said. It will be cheaper, they said. It will be easier to manage, they said.
And it was cheaper, until downtimes began to affect more and more sites when central SPOFs got hit.
And I wonder how much of that need for these centralized SPOFs actually comes from the sheer absurd amount of bloat, ads, code and assets that sites these days "have" to deliver to the customer. I 'member times when pages had 100kb total size, loaded in an instant and were perfectly usable.
by evouga on 6/8/21, 10:34 AM
Since Fastly’s own website is currently down:
What is fastly? Why are a huge number of web sites dependent on them? They are some kind of web host for companies that don’t want to run their own servers/data centers?
by devops000 on 6/8/21, 10:28 AM
BTC/USD is down too.
by ysavir on 6/8/21, 10:38 AM
Tangential question, but with services like these, is there a known way to handle failure gracefully? Some way to automatically bypass these services if they are known to be down?
by sergiomattei on 6/8/21, 10:23 AM
Yikes, seems like a massive outage.
EDIT: Hexdocs is down, elixir-lang.org is down
by angled on 6/8/21, 10:46 AM
None of the ES/NQ/RTY/YM futures contracts took kindly to the outage! This could have had a much wider financial impact. Most seem to have recovered now.
by asicsp on 6/8/21, 10:01 AM
Related thread: https://news.ycombinator.com/item?id=27432397
by hypnoscripto on 6/8/21, 10:06 AM
Looks like fastly.com uses fastly…
by mcintyre1994 on 6/8/21, 10:01 AM
Do they have an official status page? Googling gets https://docs.fastly.com/en/guides/fastlys-network-status which is 503
Edit: Elsewhere in the comments: https://status.fastly.com/incidents/vpk0ssybt3bj
by devops000 on 6/8/21, 10:19 AM
Hacker News is the only one UP!
by john37386 on 6/8/21, 10:46 AM
It should be resolve soon. From fastly status page:
The issue has been identified and a fix is being implemented. Posted 1 minute ago. Jun 08, 2021 - 10:44 UTC
by willvarfar on 6/8/21, 10:41 AM
https://www.bbc.com/news/technology-57399628 is rendering and reporting on the story, but BBC itself was down at the start of the outage, with the same 503 varnish error message.
Presumably the BBC has some kind of fallback in place.
The journalists ought interview their own techies :)
by jchandra on 6/8/21, 10:35 AM
https://www.greenhouse.io/ down as well.
by hestefisk on 6/8/21, 11:36 AM
The Guardian summarised this as well: https://www.theguardian.com/technology/2021/jun/08/massive-i...
by perino on 6/8/21, 10:07 AM
Anything hosted on Firebase seems to be down
by easytiger on 6/8/21, 10:23 AM
I will NEVER understand why people put so much trust in single provider solutions for anything critical.
by vfclists on 6/8/21, 10:07 AM
What happens when there is excessive centralization.
I thought that one of the principles behind the Internet is to be able to reroute around failures, but neither these service providers nor their clients ever seem to learn.
I guess in their mind that only applies to packet routing not services. SMH
by MrGilbert on 6/8/21, 10:28 AM
Interestingly, https://www.fastly.com/ works for me, whereas https://fastly.com/ doesn't.
by Omnious58 on 6/8/21, 10:32 AM
I was wondering why my Tidal app just stopped mid song and won't connect, after much googling and absolutely no help or even notifications from Tidal explaining there's an issue it seems this outage is the culprit. Bugger.
by diveanon on 6/8/21, 11:01 AM
Time to develop CDN for CDNs.
It seems like a pattern that CDN have overly centralized the web and lead to issues like this.
Maybe its time to build a CDN that distributes your static assets to multiple CDNs and has a set of fallback states for service outtages.
by tfar on 6/8/21, 10:03 AM
https://flutter.dev/ and https://fastlane.tools/ as well.
by Dobbs on 6/8/21, 11:03 AM
I got a push notification from the CNN app telling me a bunch of the internet was down due to a cloud provider. I clicked the link only for the app to open to a 503. In hindsight not surprising, but quite amusing.
by misnome on 6/8/21, 10:01 AM
pypi.org, but not https://status.python.org/ - I'm impressed that they actually hosted the status page differently!
by lopatin on 6/8/21, 10:20 AM
Their status page keeps claiming that my region, Chicago (ORD), is either Degraded Performance, or Operational. But clearly it's down. Is fuzzing metrics like this how they hit their SLA targets?
by abhiminator on 6/8/21, 10:46 AM
Looks like they're currently applying a fix.
https://status.fastly.com/incidents/vpk0ssybt3bj
by montag on 6/8/21, 10:20 AM
It's funny, I searched Twitter for "Ebay down" and the top result was an Ebay tweet with some not coincidentally broken Twitter emoji SVGs (as another person mentioned)...
by theginger on 6/8/21, 10:04 AM
GitHub? I had some issues, checked the service status page said no issues, but images were returning a 503. Maybe they host their service status page elsewhere including using fastly.
by monkeydust on 6/8/21, 10:16 AM
Pretty bad www.gov.uk is down as more services move to digital.
by plasma on 6/8/21, 10:16 AM
I briefly saw an output error about "domain not found" when hitting fastly.com, wonder if some list of domains has hit a limit/flushed/etc.
by fareesh on 6/8/21, 3:55 PM
How does one design a system that has a redundancy for when the CDN goes down? Paying for more than one CDN is probably too expensive isn't it?
by grumple on 6/8/21, 11:06 AM
Good job Fastly for getting the issue identified and resolved so quickly. < 1 hour to identify, <13 minutes to fix (assuming status is accurate).
by an0n4u on 6/8/21, 10:03 AM
numpy docs, too. i think it's cloudflare related as well. at least, I keep seeing some cloudflare errors interpolated with the 503 varnish error.
by MyOnePiece on 6/8/21, 10:43 AM
Quick question if the cdns are down why cant traffic be routed to the web servers the central web servers the company owns ?
I thought cdns had fallback configured ?
by _kyran on 6/8/21, 10:47 AM
Those of you that work in DevOps, SRE or are CTOs.
What kind of things do you put in place to manage these kind of centralised issues that are beyond your control?
by devops000 on 6/8/21, 10:15 AM
Heroku is down https://dashboard.heroku.com/
by JCWasmx86 on 6/8/21, 11:02 AM
>The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return.
Is fixed
by Nilef on 6/8/21, 10:01 AM
Ironically, even this Outage page is out for me
by ur-whale on 6/8/21, 10:16 AM
Wow, talk about a brutal SPOF, most of the things I had planned to work with today are broken: reddit, github, stack overflow.
by taosx on 6/8/21, 10:46 AM
I̶n̶ ̶r̶o̶m̶a̶n̶i̶a̶ ̶e̶v̶e̶r̶y̶t̶h̶i̶n̶g̶ ̶s̶e̶e̶m̶s̶ ̶b̶a̶c̶k̶ ̶t̶o̶ ̶n̶o̶r̶m̶a̶l̶.̶.̶.̶?̶
Edit: nope, just worked for 2-3 requests (10 secs)
by anotheryou on 6/8/21, 10:53 AM
Looks fixed: https://downdetector.com/
by jl6 on 6/8/21, 10:43 AM
Worrying that this is impacting so many dev toolchains and services, which will hinder the ability to respond to the issue.
by timvisee on 6/8/21, 10:19 AM
This seems to be a bigger issue. BGP failure?
by _kyran on 6/8/21, 10:46 AM
Things seem to have come back online in Australia, although not sure if that's just sites switching over their DNS?
by LightG on 6/8/21, 10:48 AM
"The internet will just route around a local / centralised problem ... like water around an object"
Obligatory LOL ...
by graphman on 6/8/21, 10:11 AM
Firebase Dynamic Links is affected too. Checking the IP looks like they are using Fastly which is quite surprising.
by taurath on 6/8/21, 10:20 AM
I’ve noticed lots of social media content is tied to this - Reddit and Twitter images and some videos, for one.
by loriverkutya on 6/8/21, 10:48 AM
The issue has been identified and a fix is being implemented. Posted 3 minutes ago. Jun 08, 2021 - 10:44 UTC
by ilaksh on 6/8/21, 10:37 AM
Let's make all of the main internet sites dependent upon one central private service. Great idea guys.
by artembugara on 6/8/21, 11:01 AM
Seems like another single point of failure. What is a solution to not be affected by such an outage?
by toong on 6/8/21, 10:57 AM
It is time to remove that "100% uptime guarantee" claim from the website :grimacing:
by classicflavour on 6/8/21, 10:48 AM
My work's website is down too and the regular sites I use to escape work borderm
by gansai on 6/8/21, 10:49 AM
Fastly is back now. (The issue has been identified and a fix is being implemented.)
by pattyj on 6/8/21, 10:53 AM
It would be interesting to see estimations on the man-hour cost of this outage.
by mothershesha on 6/8/21, 10:20 AM
Got the same here (Australia)
by johnstonnorth on 6/8/21, 10:00 AM
rubygems.org affected too
by vincentmarle on 6/8/21, 10:36 AM
Well I know where to go next time if I were to be a Russian hacker
by clawphantom on 6/8/21, 10:27 AM
Twitch isn’t working and not responding and also the web dashboard
by luke2m on 6/8/21, 10:49 AM
When this happens to cloudflare, it will be even more impactful.
by colesantiago on 6/8/21, 10:59 AM
Looks like Fastly did not work as advertised, very misleading.
by reuben_scratton on 6/8/21, 11:10 AM
I'm sure it's just a coincidence that today is Patch Tuesday.
:-|
by zwirbl on 6/8/21, 10:49 AM
Spotify is also hit, though it still works without images
by ddtaylor on 6/8/21, 10:40 AM
Someone must have 51% attack the Pied Piper blockchain!
by vlan121 on 6/8/21, 12:04 PM
Damn, I thought I cloud blame myself or the provider..
by ronyfadel on 6/8/21, 10:01 AM
Ten Percent Happier is down, and now my day is ruined.
by fsnowdin on 6/8/21, 10:23 AM
just had my own site down because of this. glad to see it wasn't my fault lol but good luck to the Fastly people on fixing the issue.
by clawphantom on 6/8/21, 10:28 AM
Twitch isn’t responding and also the web dashboard
by 8K832d7tNmiQ on 6/8/21, 10:09 AM
That explains why I couldn't access reddit
by navanchauhan on 6/8/21, 10:02 AM
No wonder, The Verge and NYT are down too.
by rich_sasha on 6/8/21, 10:20 AM
www.python.org down as well, with the shortest of messages: 'connection failure'. Probably related?
by NewLogic on 6/8/21, 10:06 AM
Even amazon.com styling is borked for me
by dilawar on 6/8/21, 10:03 AM
I think reddit in India is down as well.
by JosephK on 6/8/21, 10:56 AM
Extremely long call, but what are the chances this turns out connected to the raids on organised crime using the An0m app that started today?
by john37386 on 6/8/21, 10:52 AM
It's probably a DDoS attack.
by dragosbulugean on 6/8/21, 10:22 AM
And all Webflow sites it seems...
by alixaxel on 6/8/21, 10:02 AM
Indeed, part of GitHub (.io) too.
by ur-whale on 6/8/21, 10:19 AM
Looks like HN is working ;-)
by jfny on 6/8/21, 10:48 AM
Do companies really not run test suites / do manual testing before deploying to production?
by timetosleep on 6/8/21, 10:55 AM
Seems to be back online
by rvz on 6/8/21, 10:04 AM
Basically everything is broken. "Centralising Everything" huh
by dragosbulugean on 6/8/21, 10:22 AM
All Webflow sites?
by mlnj on 6/8/21, 10:00 AM
StackOverflow too.
by schappim on 6/8/21, 10:02 AM
Parts of Shopify
by ur-whale on 6/8/21, 10:18 AM
Looks like an SRE team rolled out buggy software.
by rottc0dd on 6/8/21, 11:11 AM
github is back online. SSO too.
by raylus on 6/8/21, 10:14 AM
Whew, DevOps fire alarms are going off!
by raylus on 6/8/21, 10:05 AM
github.com is pretty broken
by schappim on 6/8/21, 10:00 AM
SMH.com.au
by heavydust on 6/8/21, 10:48 AM
the problem has been fixed
by heavydust on 6/8/21, 10:06 AM
reddit.com is affected too
by alexannic on 6/8/21, 10:01 AM
cnn.com is down as well.
by cwen on 6/8/21, 10:56 AM
A real-world Chaos experiment!
by cdev on 6/8/21, 11:08 AM
it seems to be up now
by magicturtle on 6/8/21, 10:03 AM
reddit down aswell
by Metacelsus on 6/8/21, 10:18 AM
I first noticed that xkcd was down. Then I went to post about it on reddit . . . also down! Good thing HN is up.
by nindalf on 6/8/21, 10:00 AM
Taken out xkcd as well.
by pts_ on 6/8/21, 10:01 AM
Are these sites on the same cloud or CDN?
by colesantiago on 6/8/21, 11:09 AM
Also, why has this been allowed to happen? Billions of dollars lost because of this one company?
I don't understand this.
by ramraj07 on 6/8/21, 10:10 AM
For a moment I thought all of Western internet was cut off from India. Says how siloed my browsing habits are!
by raphaelj on 6/8/21, 10:17 AM
Couldn't be happier I moved https://noisycamp.com to BunnyCDN.com.
by TheRealDunkirk on 6/8/21, 10:47 AM
Every other comment about what's down in this thread -- as if we needed dozens of site-by-site accountings of this outage in the first place -- is a bitch about reddit. Why is reddit so important to this crowd? The specific topics I used to read the site for (half a dozen years ago) have all been overrun by "bucket people," there is literally never an answer to any question I find a google link to there, and the site's design is actively user-hostile. Seriously: what's keeping that place afloat? Porn, I suppose.