from Hacker News

Static B-Trees: A data structure for faster binary search

by sereja on 2/17/22, 5:42 PM with 49 comments

  • by ot on 2/17/22, 7:20 PM

    Very interesting post. The best results I knew of for static ordered search [1] were from the 2015 Khuong-Morin paper [2], which is not cited in the article but the author cites it in the previous post, and in their benchmarks Eytzinger+prefeching performed best. The improvement on top of that is pretty impressive. At the larger sizes, the latencies are basically 1 cache miss (~50ns).

    However, I have some comment on the benchmark, the iterations have no loop dependency on the previous result, so the CPU is allowed to pipeline some loads. This is rarely representative of real workloads, where the key being searched comes from other computations, so this is more a measure of highest possible reciprocal throughput than actual latency. Would be interesting to see what happens if they make the key being searched a function of the result of the previous lookup. (EDIT: as gfd below points out, the article actually discusses this!)

    Also some of the tricks only work for 32-bit keys, which allow the vectorized search in nodes. For larger keys, different layouts may still be preferable.

    [1] Ordered search is when you need to look up keys that are not in the dataset, and find predecessor/successor. If you only do exact lookups, a hashtable will almost certainly perform better, though it would require a bit more space.

    [2] https://arxiv.org/pdf/1509.05053.pdf

  • by ignoramous on 2/17/22, 6:57 PM

    An interesting resource.

    Just this past day, I prototyped a Skip List backing a sorted key/value store, and it was super simple to do so (as compared to AVL / Red Black).

    B-Trees are notorious for their complexity, and so SingleStore (memSQL) reasoned that their move to use Skip Lists to back their tables was justified: https://archive.is/Qhpgn

    It would be interesting to see the author attempt faster Skip Lists, as their expertise in this topic seems deep enough.

  • by Const-me on 2/18/22, 3:43 AM

    Good article, but here’s a few minor things.

    > we combine the vector masks using the packs instruction and readily extract it using movemask just once

    Blend instructions are generally slightly faster than pack instructions, _mm256_blend_epi16 in that case. The order is not the same so the permute() function needs different shuffle vector, but otherwise it should work the same way.

    > B-tree from Abseil to the comparison, which is the only widely-used B-tree implementation I know of

    Another one: https://github.com/tlx/tlx/tree/master/tlx/container It’s not using any SIMD but the code quality is OK, I didn’t have much issues vectorizing the performance-critical pieces in my projects.

  • by whatshisface on 2/18/22, 7:16 AM

    From the previous article:

    >The numeration we will use is actually half a millennium old, and chances are, you already know it.

    Then, it is revealed:

    >Michaël Eytzinger is a 16th-century Austrian nobleman known for his work on genealogy, particularly for a system for numbering ancestors called ahnentafel (German for “ancestor table”).

    How many people have heard of "ahnentafel"?

  • by rurban on 2/17/22, 10:43 PM

    Still does not beat static perfect hashes, the usual solution in these cases.
  • by tabtab on 2/17/22, 7:27 PM

    In general the "best" algorithm seems dependent on specific usage patterns. A different angle is to study or sample actual usage patterns and automatically adjust the structure/algorithm based on that. The indexes can be readjusted during off/slow hours. (If you don't have off hours, then such a system may not be the best fit.)
  • by ww520 on 2/17/22, 8:07 PM

    For static data would it be faster to use a trie or radix tree?
  • by lichtenberger on 2/18/22, 11:22 AM

    I guess for static data a carefully crafted trie would probably be even faster as it just has to compare a byte or more in each node and thus is O(K) whereas K is the key length instead of O(n) where n is the number of keys.

    If it isn't static the height optimized trie by Robert Binna is the most advanced trie, I guess. It might also be a good fit for a static trie as it keeps the height to a minimum.

  • by SNosTrAnDbLe on 2/17/22, 7:25 PM

    This is a very good resource. I am bookmarking it! It is very engineer friendly
  • by cryptonector on 2/18/22, 3:41 AM

    This is very cool. Imagine using this for checking if a group ID in an ACL is present in a subject's group list. Binary search is fast, but this might be faster.
  • by sydthrowaway on 2/17/22, 7:52 PM

    I need this for my interview skills.