from Hacker News

Show HN: Integer Map Data Structure

by billziss on 1/25/24, 12:28 AM with 27 comments

This project presents a new data structure for storing ordered integer maps. The proposed data structure is a compressive, cache-friendly, radix tree that has performance comparable to an unordered map (`std::unordered_map`) and is an order of magnitude faster than an ordered map (`std::map`).
  • by bts on 1/25/24, 12:55 AM

    FWIW there is prior art here. e.g. see IntMap in Haskell: https://hackage.haskell.org/package/containers-0.7/docs/Data...
  • by jemfinch on 1/25/24, 6:08 AM

    This really doesn't seem to be comparing to comparable data structures. For int map specializations like this, the optimized alternatives are things like Judy (which is looking quite aged these days) or roaring bitmaps, not to mention that any C++ developer using "ordinary" maps will be using absl's SwissTable (flat_hash_map) or folly's F14 (F14FastMap) or perhaps absl::btree_map if order is important. Comparisons to std::map and std::unordered_map are simply too naive to make the case for this data structure.
  • by lichtenberger on 1/25/24, 9:14 AM

    We're using a similar trie structure as the main document (node) index in SirixDB[1]. Lately, I got some inspiration for different page-sizes based on the ART and HAMT basically for the rightmost inner pages (as the node-IDs are generated by a simple sequence generator and thus also all inner pages (we call them IndirectPage) except for the rightmost are fully occupied (the tree height is adapted dynamically depending on the size of the stored data. Currently, always 1024 references are stored to indirect child pages, but I'll experiment with smaller sized, as the inner nodes are simply copied for each new revision, whereas the leaf pages storing the actual data are versioned themselfes with a novel sliding snapshot algorithm.

    You can simply compute from a unique nodeId each data is assigned (64bit) the page and reference to traverse on each level in the trie through some bit shifting.

    [1] https://github.com/sirixdb/sirix

  • by NWoodsman on 1/25/24, 2:26 AM

    Also will throw in to the mix, in C#:

    https://julesjacobs.com/2014/11/11/immutable-vectors-csharp....

    His implementation uses buffers of capacity 32, generics, and bit shifting to do lookups.

  • by winrid on 1/25/24, 2:22 AM

    Neat, thank you! I'd love to see how it compares to the libgdx IntMap[0].

    [0] https://github.com/libgdx/libgdx/blob/master/gdx/src/com/bad...

  • by AaronFriel on 1/25/24, 7:49 AM

    Interesting! Reminds me a great deal of Judy Arrays: https://en.m.wikipedia.org/wiki/Judy_array

    Judy Arrays are a radix trie with branching and a few node types designed to be cache line width optimized.

  • by repsilat on 1/25/24, 2:32 AM

    Looks rad, I was going to look into some b-trees for a use-case where I need an ordered map of things similar to integers and this might be better.

    I couldn't immediately see, is there mention of whether insertions invalidate iterators? Maybe not strictly needed for my use-case but good to know.

  • by ww520 on 1/25/24, 8:22 AM

    This looks very good. The idea of using a subnet-mask style to compute the prefix of a node is pretty novel. I haven't seen anything like it. The choice of span factor of 16 is a good compromise between node size and tree depth. The node slot packing is amazing. Actually if you relax the restriction on 64-byte node to 128-byte node, you can get 64 bits per slot and will get a much higher limit for the item count. Newer CPU's are starting to support 128-byte cache line.
  • by ursusmaritimus on 1/25/24, 7:24 AM

    Interesting, but the summary does not mention an important fact: the data structure can contain at most 67108864 items, which is a quite low limit.
  • by JonChesterfield on 1/25/24, 12:34 PM

    Appears to be a 16 way branching trie which completely misses both advantages of tree structures over hashes:

    1/ This tree is mutable, insert doesn't give you a new tree via path copying

    2/ union/intersection style operations can be sublinear. None of the batch operations are implemented

  • by notfed on 1/25/24, 6:00 AM

    It'd be nice to include djb's crit-bit tree implementation (linked to from the intro) in the benchmarks ("Performance Test" section).