from Hacker News

Scalable but Wasteful, or why fast replication protocols are slow

by hugofirth on 7/12/21, 6:20 AM with 24 comments

by rdw on 7/13/21, 10:16 PM
The hypothesis in the article could be correct (that industry is not adopting new academic innovations because they fail in the real world). Based on my experience in this industry, though, it could just be that there isn't a super strong connection between academia and the people implementing these kinds of systems. I've had many conversations with my academically-minded friends where they're astonished that we haven't jumped on some latest innovation, and I have to let them down by saying that the problem that paper was addressing is super far down our list of fires to put out. Maybe there are places where teams of top-tier engineers are free to spend 6 months every year rewriting critical core systems use un-battle-scarred new algorithms that might have 20% performance improvements, but most places I've worked would achieve the same result for far less money by spending 20% more on hardware.
by luhn on 7/13/21, 10:43 PM
Honestly I think the answer is simpler: People don't need better algorithms. Paxos and Raft are generally used to build service discovery and node coordination, these are not demanding workloads and overwhelmingly read-heavy. Even the largest deployments can probably be serviced by a set of modestly-sized VMs. Paxos and Raft are well-understood algorithms with a choice of battle-tested implementations, why would anyone choose different?
The whole section on "bin-packing Paxos/Raft is more efficient" is strange, because people don't generally bin-pack Paxos/Raft—The bin-packing orchestrators are built off of Paxos/Raft!
by hugofirth on 7/13/21, 9:35 PM
Another thing which makes the Raft/Paxos vs new-consensus-algorithm comparisons complicated is caching.
If your raft state machines are doing IO via some write through cache (which they often are) then having specific machines do specific jobs can increase the cache quality. I.e. your leader node can have a better cache for your write workload, whilst your follower nodes can have better caches for your read workload.
This may lead to higher throughput (yay) but then also leave you vulnerable to significant slow-downs after leader elections (boo).
What makes sense will depend on your use case, but I personally agree with the author that multiple simple raft/paxos groups scheduled across nodes by some workload aware component might be the best of both worlds.
by toolslive on 7/13/21, 10:39 PM
None of this actually matters. Consensus algorithms allow you to achieve consensus. Period. There's no requirement whatsoever on what you're getting consensus on. A consensus value could be _one_ database update, but it doesn't need to be. It can also consist of 666 database transactions across 42 different namespaces.
by LAC-Tech on 7/13/21, 10:06 PM
> The protocol presents a leader-less solution, where any node can become an opportunistic coordinator for an operation.
Does leader = master here? My first reaction is that this is a multi-master system but I can't quite unpack "opportunistic coordinator".
by mistralefob on 7/13/21, 8:56 PM
So, why?
by orangepanda on 7/14/21, 9:31 AM
Can you not with the clickbait patterns?
Actual title - why fast replication protocols are slow