by rlancer on 11/11/18, 8:47 PM with 407 comments
by shareometry on 11/12/18, 12:51 AM
1) For three whole days, it was questionable whether or not a user would be able to launch a node pool (according to the official blog statement). It was also questionable whether a user would be able to launch a simple compute instance (according to statements here on HN).
2) This issue was global in scope, affecting all of Google's regions. Therefore, in consideration of item 1 above, it was questionable/unpredictable whether or not a user could launch a node pool or even a simple node anywhere in GCP at all.
3) The sum total of information about this incident can be found as a few one or two sentence blurbs on Google's blog. No explanation nor outline of scope for affected regions and services has been provided.
4) Some users here are reporting that other GCP services not mentioned by Google's blog are experiencing problems.
5) Some users here are reporting that they have received no response from GCP support, even over a time span of 40+ hours since the support request was submitted.
6) Google says they'll provide some information when the next business day rolls around, roughly 4 days after the start of the problem.
I really do want to make sure I'm understanding this situation. Please do correct me if I got something wrong in this summary.
by usmannk on 11/11/18, 11:58 PM
by Jedi72 on 11/11/18, 10:03 PM
- Someone at Google right now, probably.
by justinsb on 11/11/18, 10:15 PM
It looks like the UI issue was actually fixed, and that we just didn't update the status dashboard correctly. But we're double checking that and looking into some of the additional things you all have reported here.
by hacknat on 11/11/18, 10:38 PM
Why do you guys suffer global outages? This is your 2nd major global outage in less than 5 years. I’m sorry to say this, but it is the equivalent of going bankrupt from a trust perspective. I need to see some blog posts about how you guys are rethinking whatever design can lead to this - twice - or you are never getting a cent of money under my control. You have the most feature rich cloud (particularly your networking products), but down time like this is unacceptable.
by scarface74 on 11/12/18, 2:25 AM
No one would ever ask why you chose AWS. The old “no one ever got fired for buying IBM”.
Even if you chose Azure because you’re a Microsoft shop, no one would question your choice of MS. Besides, MS is known for thier enterprise support.
From a developer/architect standpoint, I’ve been focused the last year on learning everything I could about AWS and chose a company that fully embraced it. AWS experience is much more marketable than GCP. It’s more popular than Azure too, but there are plenty of MS shops around that are using Azure.
by AlexB138 on 11/11/18, 9:47 PM
by marcinzm on 11/11/18, 10:04 PM
>We will provide more information by Monday, 2018-11-12 11:00 US/Pacific.
Wait, did the people tasked with fixing this just take the weekend off?
by rlancer on 11/11/18, 8:53 PM
by scarface74 on 11/11/18, 10:10 PM
What would a small business do as a contingency plan?
by rlancer on 11/12/18, 1:26 AM
by 7ewis on 11/11/18, 11:15 PM
One thing I do care about though, is root cause analysis. I love reading a good RCA, it restores my faith in the company and makes me trust them more.
(I'm not affect by the GKE outage so opinions may differ right now!)
by locusm on 11/12/18, 2:21 AM
by thwy12321 on 11/11/18, 9:50 PM
by sladey on 11/11/18, 10:10 PM
The timeline of this disruption matches when we started experiencing cloud build errors.
by ernsheong on 11/12/18, 7:45 AM
https://status.cloud.google.com/incident/container-engine/18...
by 013a on 11/12/18, 1:16 AM
Yet, somehow every major cloud provider experiences global outages.
That old AWS S3 outage in us-east-1 was an interesting one; when it went down, many services which rely on S3 also went down, in other regions beside us-east-1 because they were using us-east-1 buckets. I have a feeling this is more common than you'd think; globally-redundant services which rely on some single point of geographical failure for some small part.
by spiderPig on 11/12/18, 12:14 AM
by qaq on 11/12/18, 12:07 AM
by arunoda on 11/12/18, 7:15 AM
Always saying resource not available. My account is a pretty new account.
In contrast, one of my friend is having a pretty old account which is very active. He has no such issue.
So I think due to this issue, Google has enabled some resource limitation for new accounts.
But they should properly communicate this issue.
by gigatexal on 11/11/18, 10:15 PM
by closeparen on 11/11/18, 9:27 PM
by fizzledbits on 11/12/18, 6:32 PM
An instance in us-central1-a has refused to start since last Thursday or Friday.
I created a new instance in us-west2-c, which worked briefly but began to fail midday Friday, and kept failing through the weekend.
On Saturday I created yet another clone in northamerica-northeast1-b. That worked Saturday and Sunday, but this morning, it is failing to start. Fortunately my us-west2-c instance has begun to work again, but I'm having doubts about continuing to use GCE as we scale up.
And yet, the status page says all services are available.
Is the typical of others' experiences?
by wijowa on 11/12/18, 3:48 PM
by nielsole on 11/11/18, 9:57 PM
by wb3tech on 11/12/18, 2:12 PM
https://stackoverflow.com/questions/53244471/gke-cluster-won...
by regnerba on 11/11/18, 9:24 PM
by _wmd on 11/11/18, 10:48 PM
by bdibs on 11/12/18, 6:55 AM
And for those who have used both, which would you go with today?
by franky_g on 11/11/18, 10:38 PM
Is there another status page Google? Coz the last update I'm looking at...is dated on the 9th..
by fulafel on 11/12/18, 6:34 AM
by whatshisface on 11/11/18, 11:30 PM
by fergie on 11/12/18, 6:46 AM
by thomasfl on 11/12/18, 11:36 AM
by haosdent on 11/12/18, 9:12 AM
by shiftnight on 11/11/18, 10:57 PM
I have a feeling that a microservice architecture is overkill for 99% of businesses. You can serve a lot of customers on a single node with the hardware available today. Often times, sharding on customers is rather trivial as well.
Monolith for the win! Opinions?
by aaaaaaaaaab on 11/12/18, 9:35 AM
by spullara on 11/12/18, 1:06 AM