from Hacker News

Update on 1/28 service outage

by traviskuhl on 1/29/16, 4:07 PM with 186 comments

  • by rburhum on 1/29/16, 4:52 PM

    Yesterday I was being a bit of an ass to a few people about how "the whole point of using git is so that we can do decentralized code management and why these dependencies were being pulled from our private github if the could be sent point to point yadda yadda yadda". Then they proceeded to go over the list of package managers and dependencies we used and I had to shut up. Even when we host our own Docker Hub and package managers (we do), if you dig far enough, you can find some dependency of a dependency of dependency that relies on GitHub. Brew/npm/build script/whatever. It is crazy how everything has changed so much in the past few years. GitHub went from something that was really nice to have to a core requirement for complex systems that rely heavily on open source.
  • by skewart on 1/29/16, 4:28 PM

    Am I the only one who is a little shocked that a power outage could have such a huge effect and bring them down for so long? I'm not an infrastructure guy, and I don't know anything about Github's systems, but aren't data center power outages pretty much exactly the kind of thing you plan for with multi-region failover and whatnot. Is it actually frighteningly easy for kind of to happen despite following best practices? Or is it more likely that there's more to the story than what they're sharing now?
  • by nickpsecurity on 1/29/16, 5:39 PM

    Here's the only page I could quickly find on Github's architecture for those interested:

    https://github.com/blog/530-how-we-made-github-fast

    This looks like a single datacenter. I don't see anything here indicating high availability or other datacenters. You'll usually spot either an outright mention of it or certain components/setups common in it. They might have updated their stuff for redundancy since then. However, if it's same architecture, then the reason for the downtime might be intentional design where only a single datacenter has to go down.

    Might be fine given how people apparently use the service. It's just good to know that this is the case so users can factor that into how they use the product and have a way of working around the expected downtime if it's critical to do so.

  • by bhaak on 1/29/16, 4:26 PM

    "Millions of people and businesses depend on GitHub"

    Well, we shouldn't depend on it so much.

    I shudder at the thought what an outage of GitHub would mean for our company. This time, we were lucky as it was during the night in Europe.

    Unfortunately, I don't have the power to test this scenario in our company.

  • by anton_gogolev on 1/29/16, 4:31 PM

    It's one thing when one temporarily loses access to remote repositories for pushes. Quite bearable, because you can exchange code across your corporate network using patches and whatnot. And it's totally different when you cannot friggin build anything because package managers grab dependencies directly off of GitHub.
  • by bjacobel on 1/29/16, 4:14 PM

    Not much detail here. A more thorough postmortem would give me more confidence they can recover from another similar issue. Hoping to see one soon.
  • by frik on 1/29/16, 4:14 PM

    You can see the cascade effect on their status page graphs: https://status.github.com/
  • by tommoor on 1/29/16, 4:22 PM

    This post makes it sound like Github has it's own data centers and power infrastructure structure, this is definitely news to me.. I'd presumed co-lo at best.
  • by moondev on 1/29/16, 4:28 PM

    Github doesn't deploy their services in multiple az's?
  • by beachstartup on 1/29/16, 4:56 PM

    it seriously makes me lol that people are upset, or surprised, that an internet service went down for a couple of hours. a couple of hours! get some perspective please. go for a walk, get a tasty burrito, try a new brand of hot sauce.

    "why didn't they do X, Y, or Z"

    the answer in every case is it's extremely expensive, or extremely hard to do, or both. you want a reason, there's the reason. maybe they'll fix it. maybe they won't. next question.

    make your own backups and redundant systems. "but github is so critical!" -- even more reason to have a backup. bad shit happens in this world. even to good people. prepare or suffer the consequences.

  • by ljk on 1/29/16, 5:26 PM

    Maybe I'm ignorant, but why do companies rely on github? Why not just host it in-house? If there's power outage in the office then everything would be down anyways, right?
  • by gavazzy on 1/29/16, 5:25 PM

    Would it be possible for a cross between Git and Torrents? Rather than having a central server to pull/push from, instead the server would provide a list of clients. If the server goes down, the list is still available, and so people who depend on it would be able to communicate.
  • by matt_wulfeck on 1/29/16, 7:59 PM

    Why is it so hard for us to distribute our dependencies? Hash the package to a sha and put t anywhere on the Internet. Then we just need a service that holds and updates the locations of the hashes and we can fetch them anywhere.
  • by ibejoeb on 1/29/16, 5:07 PM

    For those that have been affected by this, what parts of your process were disrupted? I've read, so far:

      * Build fails due to unreachable dependencies hosted by GitHub
      * Development process depends on PRs
  • by free2rhyme214 on 1/29/16, 4:50 PM

    Chinese DDoS? Somehow I don't buy power going out at a server farm.
  • by smaili on 1/29/16, 4:23 PM

    It's always scary when a cloud service you rely on goes down but great to see GitHub recover. Well done!
  • by out_of_protocol on 1/29/16, 4:54 PM

    Various date/time formats across the world bringing me to the knees. If 1/28 outage was _that_ rough 2/28 would be twice as bad and 28/28 would feel like armageddon maybe?
  • by ryanfitz on 1/29/16, 5:30 PM

    I recently read a blog post from Github about them operating their own datacenter http://githubengineering.com/githubs-metal-cloud/

    Im not positive, but it sounds like a fairly recent switch from a cloud provider to their own datacenter. If thats the case, Id expect a number of outages to come in the following months.