from Hacker News

Stripe's API was down

by klinskyc on 7/10/19, 4:50 PM with 143 comments

  • by edwinarbus on 7/10/19, 5:11 PM

  • by klinskyc on 7/10/19, 4:59 PM

    Between Cloudflare, Google, and now Stripe, I feel like there's been a huge cluster of services that never go down, going down. Curious to see Stripe's post-mortem here
  • by rectangletangle on 7/10/19, 6:00 PM

    If you haven't broken a critical system at least once, you haven't written enough production code. Everyone appreciates the other 99.993207% of the time where the system functions flawlessly. I look forward to reading the postmortem.
  • by pgm8705 on 7/10/19, 4:59 PM

    This is painful. I get a text notification every time a transaction fails... they're really flying in right now. Losing a ton of revenue and it is completely out of my hands.
  • by cameronbrown on 7/10/19, 5:26 PM

    Google had their cables physically sliced.

    Cloudflare was brought down by a config push.

    Anybody want to guess what killed Stripe this morning?

  • by jammygit on 7/10/19, 5:09 PM

    I wonder what the global cost to the economy of a 24 hour stripe outage would be. It’s crazy when you think about how important certain “infrastructure” is
  • by uxamanda on 7/10/19, 9:21 PM

    Looks like it is struggling again.
  • by craze3 on 7/10/19, 5:02 PM

    No wonder my bugfix wasn't working
  • by novaleaf on 7/10/19, 10:30 PM

    As of 22:00 UTC, stripe was down again. I think it's up now.
  • by pcunite on 7/10/19, 6:26 PM

    LinkedIn appears to be having issues right now too.
  • by kamizoo on 7/10/19, 5:06 PM

    Yup - not to plug my own website (others may find it useful) - got a notification for this 14 minutes ago at https://statusnotify.com
  • by normalperson on 7/10/19, 5:41 PM

    "Elevated Error Rates" is such a BS term. They were down. Man up and own the mistake.
  • by the-dude on 7/10/19, 5:07 PM

    My conspiracy theory still is they are decommissioning Huawei equipment.

    Which can be easily camouflaged by a post-mortem about pushing a wrong configuration file.

  • by techie128 on 7/10/19, 6:47 PM

    I have built APIs in the Finance realm with 100% uptime. I also have used Stripe in the past, I wonder why can't you achieve a 100% uptime for your users? Are there regulatory constraints that prevent you from designing such a system?

    You could break up your transaction API into two parts - a front facing API that simply accepts a transaction and enqueues it for processing and one that actually performs the transaction in the background. The front facing API should have low complexity and rarely change. It can persist transactions in a KV store like Cassandra to maximize availability.

    The backend API that performs the transaction can have higher complexity and can afford to have lower availability. From the client's perspective, you could either respond immediately (HTTP 200) or with accepted (HTTP 202). In either case the client will be happier than the transaction failing outright.

    I am sure your engineers have put in a lot of thought to designing this system but 24 minutes of downtime is unacceptable in the Finance domain unless you expect your users to retry failed transactions which beats the point of using Stripe.

    Edit: Can someone explain why am I being downvoted? Rather than downvoting, can you provide arguments that make sense?