by sereja on 2/17/22, 5:42 PM with 49 comments
by ot on 2/17/22, 7:20 PM
However, I have some comment on the benchmark, the iterations have no loop dependency on the previous result, so the CPU is allowed to pipeline some loads. This is rarely representative of real workloads, where the key being searched comes from other computations, so this is more a measure of highest possible reciprocal throughput than actual latency. Would be interesting to see what happens if they make the key being searched a function of the result of the previous lookup. (EDIT: as gfd below points out, the article actually discusses this!)
Also some of the tricks only work for 32-bit keys, which allow the vectorized search in nodes. For larger keys, different layouts may still be preferable.
[1] Ordered search is when you need to look up keys that are not in the dataset, and find predecessor/successor. If you only do exact lookups, a hashtable will almost certainly perform better, though it would require a bit more space.
by ignoramous on 2/17/22, 6:57 PM
Just this past day, I prototyped a Skip List backing a sorted key/value store, and it was super simple to do so (as compared to AVL / Red Black).
B-Trees are notorious for their complexity, and so SingleStore (memSQL) reasoned that their move to use Skip Lists to back their tables was justified: https://archive.is/Qhpgn
It would be interesting to see the author attempt faster Skip Lists, as their expertise in this topic seems deep enough.
by Const-me on 2/18/22, 3:43 AM
> we combine the vector masks using the packs instruction and readily extract it using movemask just once
Blend instructions are generally slightly faster than pack instructions, _mm256_blend_epi16 in that case. The order is not the same so the permute() function needs different shuffle vector, but otherwise it should work the same way.
> B-tree from Abseil to the comparison, which is the only widely-used B-tree implementation I know of
Another one: https://github.com/tlx/tlx/tree/master/tlx/container It’s not using any SIMD but the code quality is OK, I didn’t have much issues vectorizing the performance-critical pieces in my projects.
by whatshisface on 2/18/22, 7:16 AM
>The numeration we will use is actually half a millennium old, and chances are, you already know it.
Then, it is revealed:
>Michaël Eytzinger is a 16th-century Austrian nobleman known for his work on genealogy, particularly for a system for numbering ancestors called ahnentafel (German for “ancestor table”).
How many people have heard of "ahnentafel"?
by rurban on 2/17/22, 10:43 PM
by tabtab on 2/17/22, 7:27 PM
by ww520 on 2/17/22, 8:07 PM
by lichtenberger on 2/18/22, 11:22 AM
If it isn't static the height optimized trie by Robert Binna is the most advanced trie, I guess. It might also be a good fit for a static trie as it keeps the height to a minimum.
by SNosTrAnDbLe on 2/17/22, 7:25 PM
by cryptonector on 2/18/22, 3:41 AM
by sydthrowaway on 2/17/22, 7:52 PM