by timf on 12/4/22, 1:11 PM with 50 comments
by dang on 12/6/22, 8:19 PM
Post mortem on Mastodon outage with 30k users - https://news.ycombinator.com/item?id=33855250 - Dec 2022 (101 comments)
(Offtopic meta note: Alert users will note that that thread was posted later than this one. This is because the second-chance process (https://news.ycombinator.com/item?id=26998308) has a race condition: the events "story makes front page" and "moderator puts story in second-chance pool" sometimes diverge and can happen in any order.)
by Enderboi on 12/7/22, 2:38 AM
I've also seen NFS/ZFS on Linux have very... bizzare... issues with locking, latency, and poor handling of errors bubbled up from the block layer taking down clients or even the host.
All of these went away when we redeployed everything into a Solaris-based distro (still exporting ZFS shares to Linux clients via NFS). It does seem something specific to the interaction of these two components under load on a Linux kernel.
Unfortunately, it also only happens under real-world production load and was impossible to create reliable test-case with simulated stress tests or benchmarking :(
by rglullis on 12/6/22, 7:51 PM
But then I realize that they are only getting these many people because they are not driven by commercial interests: even with donations, I can bet they are not collecting enough to keep things afloat and they only keep going because they don't mind spending all this time, money and resources of their own on this project. They can treat it as a (relatively expensive) hobby, and they can keep it running as long as it satisfies them.
The problem is that I think that this is harmful in the long run. Yes, people now are finally seeing the issue with ad-funded social media. But if we want to have a healthy alternative, we need to understand TANSTAAFL, we need to accept that we need to give real money to the people working on this and to have the servers available 24/7 to store and distribute the hot takes and stupid memes that we so bizarrely crave every day.
I worry that if we don't change the mindset quickly, the whole Twitter drama would be a wasted opportunity and Mastodon (and the Fediverse in general) will go back to the status quo, where surveillance capitalism is the norm and truly open systems are just a geeky curiosity.
I wish I could fund a tech-equivalent of the "buy local and organic" campaign. I wish I had more people thinking "ok, I will pay $5/month to this guy and I will bring 10 people to this instance" because it is the ethical thing to do.
by watchdogtimer on 12/4/22, 7:42 PM
by cyberpunk on 12/6/22, 7:23 PM
How does that work?
by convolvatron on 12/6/22, 10:19 PM
by bluedino on 12/6/22, 9:03 PM
by lakomen on 12/7/22, 3:48 AM
by musk_micropenis on 12/6/22, 7:11 PM
As a point of reference, look at what Stack Overflow is run on. As a caveat, SO is probably more read-heavy than Mastodon, but it also serves several orders of magnitude more volume (on a normal day in 2016 they would serve 209,420,973 HTTP requests[0]). They did this on 4 DB servers and 11 web servers. And in fact, it can (and has) worked serving this volume of traffic on only a single server.
With this setup SO was not even close to maxing out their hardware (servers were under 10% load, approximately). SO also listed their server hardware[1] in 2016. I don't know enough about server hardware to assess the difference, but to my eye they look similar on the web tier with similar amounts of memory, similar disk, etc.
I'm not saying Hachyderm is doing anything wrong, but it makes me wonder if there's a fundamental problem with the design of Mastodon. And to be clear I understand that this particular issue was caused by a disk failure, but that they even had this hardware in place running Hachyderm is surprising to me.
[0] https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...
[1] https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...
by imtringued on 12/5/22, 7:33 AM