from Hacker News

GitHub was down

by pmccarren on 3/12/21, 6:29 PM with 82 comments

  • by keithba on 3/12/21, 7:13 PM

    Hi all - I'm the head of engineering at GitHub. Please accept my sincere apology for this downtime. The cause was a bad deploy (a db migration that changed an index). We were able to revert in about 30 minutes. This is slower than we'd like, and we'll be doing a full RCA of this outage.

    For those who are interested, on the first Wednesday of each month, I write a blog post on our availability. Most recent one is here: https://github.blog/2021-03-03-github-availability-report-fe...

  • by jborichevskiy on 3/12/21, 6:51 PM

    I wonder if they track Github Status traffic volume as some sort of meta-indicator? Is it even viable?

    I was futzing around with the description for a PR and hitting save wouldn't update it, yet clicking edit would show the text I expected to see.

    Suspecting something was up I checked Github Status but it was green across the board. Assuming enough other people hit the same chain of events, could it provide a reliable enough indicator of an issue?

  • by turbonoobie on 3/12/21, 6:46 PM

    This is becoming a regular occurrence by now..

    I wonder if reliability has become less of a priority. As somebody with little to no experience of running things at scale I’m finding myself attributing this to some form of “move fast and break things”.

  • by cs-szazz on 3/12/21, 6:40 PM

    Unfortunately right when we were trying to deploy a hot-fix to production, our CI can't clone the PR to run tests.

    What do other folks use to avoid this situation? Have a Gitlab instance or similar that you can pull from instead for CI?

  • by qbasic_forever on 3/12/21, 6:49 PM

    Github folks--this is really getting bad. I find it strange that your leadership will spends weeks of time, and pen hundreds of words about making right the wrongs they created with censorship (see: https://github.blog/2020-11-16-standing-up-for-developers-yo...), yet there's almost no attention given to these major outages that keep happening for a year now.

    Where is the acknowledgment of a problem, root-cause analysis, and followup for new practices and engineering to prevent issues? Who is responsible for these issues and what are they doing to make it right? What positions are you hiring for _right now_ to get to work making your service reliable?

  • by rvz on 3/12/21, 6:55 PM

    Again? Just 11 days ago [0], GitHub Actions had a degraded service and now it is the whole of GitHub. It's becoming a regular thing for them and it really is disappointing.

    But I don't know how many times [0] I have to say this but, just get a self-hosted backup rather than 'going all in on GitHub' or 'Centralising everything'.

    [0] https://news.ycombinator.com/item?id=26301659

  • by mfer on 3/12/21, 6:50 PM

    Running a highly available service at this scale is hard. Especially when the service is ripe for dos and attacks.

    With that out of the way... GH has had a lot of issues in recent months. More than the past. I would hope those things are on a road to being fixed.

  • by suspecthorse on 3/12/21, 6:49 PM

    I started building Multiverse because of problems like this. Ironically it’s hosted on GitHub. Check it out if you are interested in decentralized VCS and code hosting.

    https://github.com/multiverse-vcs/go-multiverse

  • by justaguy88 on 3/12/21, 6:42 PM

    What's the best practice for high availability (self-hosted?) repositories?

    Is there a pass-through proxy for git? Or a leader-follower arrangement that is nice, with a proxy server?

  • by rklaehn on 3/12/21, 6:45 PM

    Great opportunity to try out decentralised alternatives like https://radicle.xyz/
  • by ffpip on 3/12/21, 6:46 PM

    Seeing the unicorn on GitHub, I opened HN and the first post confirms that GitHub is down for everyone else too :)
  • by ProtoAES256 on 3/12/21, 7:05 PM

    In the wake of recent events, are there any methods to do CI/CD which will fallback to other providers/local automagically as a result?

    My heart can't handle another rollercoaster of unicorns for long...

  • by brnt on 3/12/21, 6:46 PM

    I was in the middle of some last minute pre-weekend PR review, and midway I discover it can't actually submit any of my comments. Is there a way to review and save (intermediate) state offline?
  • by WFHRenaissance on 3/12/21, 6:42 PM

    Unicorn'd on PR, but eventually got it through.
  • by leemac on 3/12/21, 6:49 PM

    Sent in a commit to fix a few PR suggestions. It went through, but nowhere to be found on the PR. Guess I'll have to wait.
  • by johncalvinyoung on 3/12/21, 6:38 PM

    Ha. Had trouble with a PR, checked status page, no problems. Merged manually, open Hacker News, and there it is.
  • by b_fiive on 3/12/21, 6:44 PM

    rooting for y'all at GitHub!
  • by balecrim on 3/12/21, 6:42 PM

    darn, right when I had to setup a new machine and can't get homebrew :(
  • by gadrev on 3/12/21, 6:41 PM

    Yep, can't confirm a PR: Unicorn.
  • by zymhan on 3/12/21, 6:46 PM

    Happy Friday :/
  • by pault on 3/12/21, 6:44 PM

    Hopefully this will spark a productive conversation about the advantages and disadvantages of centralizing a decentralized VCS.