by klinskyc on 7/10/19, 4:50 PM with 143 comments
by edwinarbus on 7/10/19, 5:11 PM
by klinskyc on 7/10/19, 4:59 PM
by rectangletangle on 7/10/19, 6:00 PM
by pgm8705 on 7/10/19, 4:59 PM
by cameronbrown on 7/10/19, 5:26 PM
Cloudflare was brought down by a config push.
Anybody want to guess what killed Stripe this morning?
by jammygit on 7/10/19, 5:09 PM
by uxamanda on 7/10/19, 9:21 PM
by craze3 on 7/10/19, 5:02 PM
by novaleaf on 7/10/19, 10:30 PM
by pcunite on 7/10/19, 6:26 PM
by kamizoo on 7/10/19, 5:06 PM
by normalperson on 7/10/19, 5:41 PM
by the-dude on 7/10/19, 5:07 PM
Which can be easily camouflaged by a post-mortem about pushing a wrong configuration file.
by techie128 on 7/10/19, 6:47 PM
You could break up your transaction API into two parts - a front facing API that simply accepts a transaction and enqueues it for processing and one that actually performs the transaction in the background. The front facing API should have low complexity and rarely change. It can persist transactions in a KV store like Cassandra to maximize availability.
The backend API that performs the transaction can have higher complexity and can afford to have lower availability. From the client's perspective, you could either respond immediately (HTTP 200) or with accepted (HTTP 202). In either case the client will be happier than the transaction failing outright.
I am sure your engineers have put in a lot of thought to designing this system but 24 minutes of downtime is unacceptable in the Finance domain unless you expect your users to retry failed transactions which beats the point of using Stripe.
Edit: Can someone explain why am I being downvoted? Rather than downvoting, can you provide arguments that make sense?