by delifue on 6/7/25, 11:25 AM with 43 comments
by hinkley on 6/10/25, 6:12 PM
This is where tree data structures win out. Git and subversion and pure functional languages don’t replace the entire data structure. They make a new element and attach it to the tree by creating a new version of the intended parent, and a new version of its parent, on up to the root element.
So somewhere out there is space for a hybrid of ArcSwap and DashMap that uses sharded copy on write. These guys probably don’t need it given their low write rate, but shards should help with write contention for other use cases.
by nick_ on 6/10/25, 6:34 PM
Making sure the counter variable is sitting alone on its own cache line would completely remove the excessive ping-pong behaviour. The cache line with the counter would, of course, still ping-pong as it's mutated, but not excessively.
by hinkley on 6/10/25, 5:55 PM
Thundering herds will do that. Queuing theory. Once the queue builds up it takes far longer to clear than intuition suggests.
by cogman10 on 6/14/25, 12:16 AM
The tricky part here (which the author solved using rcu) is it's REALLY easy introduce a race condition here.
Imagine thread A and thread B want to add a new value to the hashmap. They arrive at the same time. arc_swap used naively would ultimately have one of those threads win and the value from the other thread disappear. Not great.
But another thing to consider with this approach is that it's expensive under any sort of write contention. As soon as writes start to spike up the amount of allocations increases in a potentially really bad way. If you have 3 threads that want to write at the same time, one thread will create 1 full copy. 1 will create 2 full copies, and the 3rd thread will create 3 full copies. If your underlying data structure has any sort of memory allocated to it, this becomes a pretty bad trade off.
Funnily, one easy way to solve this problem is to introduce locks again, but only for the writer. Possibly a good idea if you are spending more than ~1000 cycles creating the data structure.
by athrowaway3z on 6/10/25, 8:12 PM
by nasretdinov on 6/11/25, 5:57 AM
by conradludgate on 6/14/25, 1:24 AM
by kccqzy on 6/10/25, 7:55 PM
by cryptonector on 6/10/25, 10:31 PM
The stuff about "debt" is about the design of ArcSwap, which uses "debt" instead of "hazard pointers". The idea is that every thread keeps track of the last version of the hashmap that it's seen, and this is "debt" (references that need to be released eventually), and this debt is a thread-local object that is linked with all the other threads' debt, which essentially allows garbage collect of "debt" (debt collection?), or something like that. See https://github.com/vorner/arc-swap/blob/master/src/docs/inte...
I've implemented something similar myself. Instead of implementing user-land RCU (which is what perhaps I should have done) I implemented a "thread-safe variable" where reads are lock-less and wait-free, and writers pay all the costs. My thread-safe variable API is real simple and intuitive: `create(value_destructor) -> thread_safe_var`, `set(thread_safe_var, value)`, `get(thread_safe_var) -> value`, `release(thread_safe_var)`, and `destroy(thread_safe_var)` (presumably only during an atexit() handler or shared object destructor call, if at all). I have two implementations, one with a linked list of per-thread hazard pointers, and the other with a tuple of {next/current, current/previous} "cells" and a protocol that gives readers time to read the cells, figure out which is "current" and then dereference and incref a refcount.
Regardless of how one implements this data structure, the key concept is that you have to atomically compose two distinct atomic operations: a read, and a dereference to atomically increment a reference count -- this is remarkably tricky!
When you have a GC it's really easy: just read -- there is no reference count to increment.
With a linked list of per-thread hazard pointers all you have to do is read the thing then write it into your thread-local hazard pointer then read again, looping until the second read is the same as the first read. Having the value thus safely copied to the reader's thread-local hazard pointer then allows for notional garbage collection. Any thread that would release an old version of the value has to check all the hazard pointers to see that it's not in use, and this amounts to a small garbage collection.
I especially like the hazard pointer approach.
See also MVars (Haskell), transactional memory, RCU, etc.
by vrosas on 6/10/25, 6:59 PM
It blows me away an issue like this could take weeks to track down. If I were in any leadership position at this company I'd be rolling heads with the lack of telemetry or domain knowledge for these systems.