from Hacker News

Bump Allocation: Up or Down?

by celeritascelery on 3/25/24, 12:24 PM with 21 comments

by o11c on 3/25/24, 7:47 PM
Definitely up because realloc. Yes, the article mentions this, but it bears repeating. I wasn't aware that optimizers were so much better though!
That said, there's one useful feature that regrettably is poorly supported in most allocator discussion - where some higher alignment is wanted but not at the exact start of the object.
Microsoft has the only major allocator that I'm aware of that supports this; it uses the "align at offset" sense (which simplifies the case of prepending a header to an aligned object). GCC has some minimal support and uses the "align with offset" sense (which IMO makes literally everything else much easier). As a minimal example, if 0x4 is the low nybble of your pointer, Microsoft calls that "align 16 at offset 12", whereas GCC calls it "align 16 with offset 4".
Note that regardless of sense, you can store both the alignment and the offset in a single integer-sized variable by using your compiler's "find highest bit" or "round down to a power of 2" intrinsic. By exclusively using wrapper objects you can in fact hide the internally-chosen sense and support both publicly. This does however remove the possibility of only passing the log of the alignments, popular on BSD allocators.
Limiting `align` to positive `isize` is useful, but further restrictions can reduce the number of overflow checks you often need. I have never found a practical use (mostly: considering huge pages) for more than 3/4 of the bits to be used on alignment - that is, 6 bits (64, a typical cacheline nowadays) for 8-bit pointers, 12 bits (4K, a typical non-huge page nowadays) for 16-bit pointers, 24 bits (16M - suported on many ISAs, just not x86) for 32-bit pointers, and 48 bits (256T - theoretically available on RISC-V) for 64-bit pointers. You'll likely want a +1 when you use this, at least if you implement the logic the way I did.
by norir on 3/25/24, 8:47 PM
If your program has nice boundaries, an alternative to realloc is to just calloc far more memory than you are ever likely to need. Obviously this is a bad idea in certain circumstances, but works really nicely for things like a compiler with well defined entry/exit points. With this technique, you only need one branch in the up allocator to check for an oom (or you can yolo and remove this check if you are quite sure you won't oom).
In code, I essentially implement bump allocation (for a global allocator, though this could be adapted easily to reference an arena) in the following way:
```
  static unsigned char *__mem_base    = 0;
  static unsigned char *__mem_limit   = 0;
  static unsigned char *__mem_current = 0;
  
  void init (size_t max_mem) {
    __mem_base    = (unsigned char*) calloc(1, max_mem);
    __mem_current = __mem_base;
    __mem_limit   = __mem_base + max_mem;
  }
  #define ALLOC(name, type) \
    type *name = (type *) __mem_current; \
    __mem_current += (sizeof(type) + 0x7) & ~0x7; \
    if (__mem_current >= __mem_lim) { fprintf(stderr, "OOM %d", __LINE__); exit(1) }
```
I tend to use a huge value like 1GB for max_mem, which I know is extremely unlikely to be triggered when compiling a normal sized source program.
by JonChesterfield on 3/25/24, 8:38 PM
This doesn't seem right. The optimal direction for a call stack to grow is ISA dependent. I'd expect bump allocation to be likewise - either it doesn't matter at all, or the specific instructions available mean one is better.
I think what has happened here is the original post chose a specific representation for the start/current/end tuple and given that representation one direction was better, then the follow up post took that representation as foundational and noticed that some of the branches can be rolled together.
There's lots of representations available for start/current/end. It should always be possible to have arithmetic followed by a single branch, where you pick the state representation to make the test cheaper for a given direction on a given ISA.
by 1letterunixname on 3/26/24, 4:53 AM
Ah nice.
Btw, some alternative GlobalAllocs:
https://github.com/lemonrock/context-allocator
https://github.com/gnzlbg/jemallocator
https://github.com/raviqqe/bdwgc-alloc
https://github.com/jmcomets/tcmalloc-rs
https://github.com/sunjay/mgc
by DaiPlusPlus on 3/26/24, 2:13 AM
Pardon my ignorance, but this bump-allocator looks like a reinvention of arena-allocators. Is there something I’m missing?