by thecoppinger on 5/22/23, 9:26 AM with 177 comments
by alex7734 on 5/22/23, 1:28 PM
A little offtopic but the default Delphi 7 memory allocator did this, except that it also merged blocks that it obtained from different OS allocation calls.
This worked fine for regular usage, but if that memory was ever used for Bitmaps for UI stuff, it wouldn't work: Since Windows does some of its UI stuff in kernel mode, before doing anything Windows would attempt to lock the entire allocation's VAD entry to prevent you from messing with it in another thread while it was using it. If the Bitmap you were working on happened to belong to two different OS-level allocations, this lock function would fail since it wasn't meant to handle that case.
This would lead to random, extremely cryptic errors such as ERROR_NO_SCROLLBARS "The window does not have scroll bars." since the lock call was deep in the stack and the callers replaced the error with another one as it bubbled up.
In the end we had to replace the allocator to avoid that issue. That was a fun day I spent debugging that.
by devit on 5/22/23, 12:51 PM
E.g. the Linux kernel used (not sure if it's still like this) a buddy allocator to allocate pages and power-of-two blocks of pages and slab allocators to subdivide those pages and allocate data structures.
Another thing that the article doesn't mention that is important is that most production allocators make use of thread-local storage and either have per-thread caches of free blocks or sometimes whole per-thread memory regions. This is to reduce lock contention and provide memory that is more likely to be in the current core's cache.
by celeritascelery on 5/22/23, 11:11 AM
by OliverJones on 5/22/23, 3:30 PM
As an old-timer in writing code, I think my generation's most-challenging legacies (=== the things we f**ked up) are the non-robust malloc/free concept and null-terminated text strings. Bugs in code using those conventions have been exploitable to a really damaging extent. I learned to program in C from K&R. And getting data-structure code right, and safe to deploy, in that language and its derivatives is HARD.
The C inventors are (were in Dennis Ritchie's case) brilliant and Bell Labs was great. Their ideas shaped the the stuff the global internet runs on. But these parts of what thy invented ..... ouch. (All OSs had the same problems.)
I wish the wonderful article presented here carried a more prominent warning about this. Many will read it as they learn to code. The history of our profession can teach about what NOT to do as well as what to do.
by thecoppinger on 5/22/23, 11:05 AM
He recently published a similar guide on load balancing that is equally intuitive and insightful: https://samwho.dev/load-balancing/
I can't wait to see what he puts out next!
by ModernMech on 5/22/23, 11:21 AM
If I were to make some suggestions, based on how I know they would receive this:
- I would make explicit reference to heap and stack. Students who are learning this material are learning about the heap/stack dichotomy, and I think it would really improve the exposition to make clear that not all memory is allocated this way in a program.
- I would remove this line: "At the end of this post, you should know everything you need to know to write your own allocator." I can confidently say that my student will not be able to write an allocator after reading this. It's nothing to do with your piece, it's just the intersection of people who don't understand hex and who could build an allocator after a short blog post is very very small. Students who read this post and at the end still feel like they can't build an allocator will end up discouraged, with a feeling that maybe they are missing something.
- Consider rethinking how you handle hex numbers throughout. You introduce them and say they are distinguished by a preceding "0x", but then you immediately omit the 0x to save space in the figure. In my experience, students have a lot of trouble with understanding that hex and dec have the same underlying representation. They will not be able to distinguish between hex and dec bases implicitly, so from a pedagogical standpoint, it's better to be consistent throughout and include the prefix.
- Finally at the end I would make mention that there are other strategies for memory management to encourage further exploration. Other languages do it differently, and for students it's important to know which other avenues are out there. Otherwise they might think heap-based malloc/free are the only way to do things, the same way they often thing imperative programming is the only way to program.
Anyway, thank you for creating this, and I'm sure it will really help my students. In a just world, "seeing" the memory like this would ideally be first-class tooling for languages like C.
by samsquire on 5/22/23, 11:06 AM
I wrote a JIT compiler and I didn't bother calling free much, I just let the operating system free up all allocated memory.
I got into this situation often:
return_struct = do_something(mystruct);
return_struct->inner_struct = malloc(sizeof(struct my_inner_struct));
Now, who owns inner_struct? Who is responsible for freeing it? Do I free it when I assign to it?I feel this ownership complicates cross-language FFI API calls, because who is responsible for freeing structures depends on the application and the platform you're running under. For example, Rust code being called from Erlang. You have to be extra careful when memory is managed by a different language runtime.
I feel I am at the beginning of intuitively understanding what memory really is: memory is just a huge contiguous region of numbered locations. Bump allocators allocate in one direction and free all at once. Arena allocators allocate to a preallocated region, I think.
Memory is a logistical problem of how you arrange and allocate finite resources.
I am thinking of alternative visualizations of understanding memory, for example, I started writing an animation of binary search:
https://replit.com/@Chronological/ProgrammingRTS
The idea is that you see values and memory locations move around with the final goal being able to command memory to move around and be computed, such as real time strategy game.
I think if we could visualise memory as cars on a road, we would see obvious traffic jams.
by Damogran6 on 5/23/23, 12:06 AM
This was on a 386sx with 8M RAM and it was pretty much all the available memory after the OS was loaded and settled down.
A MILLION BYTES!!
Didn't do anything with it, but still, after DOS and EMS/XMS memory and all the other falderol of managing memory. (At the time, it was also the only x86 OS that would format a floppy drive without bringing all the other threads to a crawl. UI was still available-ish, BiModem was still BiModeming...
by jesselawson on 5/22/23, 3:29 PM
by photochemsyn on 5/22/23, 2:38 PM
> "List allocators used by malloc() and free() are, by necessity, general purpose allocators that aren’t optimized for any particular memory allocation pattern... To understand, let’s examine commonly used list allocation algorithms: first-fit, next-fit, best-fit and quick-fit."
That's from an article on custom memory management (aimed at embedded programming issues) I've found pretty useful, and it picks up where this leaves off:
https://www.embedded-software-engineering.de/dynamic-memory-...
You can practice writing custom memory managers in an application that runs on an operating system by only using the stack (just create a big array of int etc. and call that your memory space):
> "For the safety-critical tasks, the developer can replace the standard allocator with a custom allocator that sets aside a buffer for the exclusive use of that task, and satisfies all memory allocation requests out of that buffer... The remainder of this paper presents four examples of custom allocator algorithms: the block allocator, stack allocator, bitmap allocator and thread-local allocator. Custom memory managers can use any one of these algorithms or a combination of different algorithms."
by spatter on 5/22/23, 1:58 PM
This is not strictly true, it depends on the environment you're using. Some older operating systems and some modern embedded systems have memory mapped at the zero address.
by davidgrenier on 5/22/23, 11:28 AM
address + <value at address>
address - <value at address-1>
Shouldn't this be? address + <value at address> + 3
address - <value at address-1> - 3
by jxf on 5/22/23, 12:39 PM
by ftxbro on 5/22/23, 4:58 PM
Interesting to use hacker news as the blog's own comment section in this way.
by junon on 5/22/23, 11:57 AM
by CliffStoll on 5/22/23, 4:28 PM
My high congratulations for creating a most friendly, readable and useful lesson!
by N-Krause on 5/22/23, 10:42 AM
How I wish I had something like that when I first learned C.
by fzeindl on 5/22/23, 11:05 AM
While aggressively optimizing we replaced malloc in the end with a function we called "cooloc", that traded portability and security for speed. The debug tool here would have been useful.
by thangalin on 5/22/23, 7:41 PM
* https://github.com/DaveJarvis/mandelbrot/blob/master/memory....
I then apply this same principle of "opening" and "closing" structures throughout the application. Generally, I can quickly verify that the calls are balanced:
* https://github.com/DaveJarvis/mandelbrot/blob/master/threads...
What's nice about this pattern is that the underlying implementation of exactly how the memory is maintained for a particular data structure (e.g., whether malloc or gdImageCreateTrueColor is called) becomes an implementation detail:
* https://github.com/DaveJarvis/mandelbrot/blob/master/image.c
The main application opens then closes structures in the reverse order:
* https://github.com/DaveJarvis/mandelbrot/blob/master/main.c
This means there's only one call to malloc and one call to free throughout the entire application (third-party libraries notwithstanding), allowing them to be swapped out with ease.
Aside, logging can follow this same idea by restricting where any text is written to the console to a single location in the code base:
* https://github.com/DaveJarvis/mandelbrot/blob/master/logging...
by FrankyHollywood on 5/22/23, 4:05 PM
http://igoro.com/archive/volatile-keyword-in-c-memory-model-...
It's a bit old by now (2010), but I always remembered the mental model Igor created.
by shreyshnaccount on 5/22/23, 11:26 AM
by jamesgill on 5/22/23, 3:01 PM
by patleeman on 5/22/23, 2:51 PM
My only piece of feedback would be for the "Inline Bookkeeping" section (https://samwho.dev/memory-allocation/#inline-bookkeeping), it took a while for me to grok the numbered list to figure out which block corresponded to address + X. I wonder if there is a better way to visualize the 4 numbered bullet points? Maybe just arrows and text pointing to the visualization?
Thanks again for this wonderful article!
by Mizoguchi on 5/22/23, 11:18 AM
by tpoacher on 5/22/23, 12:53 PM
It seems to imply that malloc/free works by boundary tag? Which I don't think is the case? (and if not, it leaves the reader confused as to how it then actually works).
I know "some" languages use the tag technique (e.g. julia), but the article seems to imply that this also applies to the c code above? Or am I wrong and c also makes use of such a scheme when you use pointer arithmetic?? (I don't see how that would work with arrays if that's the case though)
by terrycody on 5/22/23, 2:29 PM
by matheusmoreira on 5/22/23, 3:58 PM
by cromulent on 5/22/23, 1:55 PM
by RamiAwar on 5/22/23, 12:18 PM
I'd love to also see applications of custom memory allocations. I know about usecases in building game engines and the importance of hitting cache there, but I'm not sure where else in the world this would be as useful.
by sailorganymede on 5/22/23, 4:58 PM
by kaba0 on 5/22/23, 11:20 AM
by bambax on 5/22/23, 6:46 PM
> Couldn't we rearrange the memory to get a block of 6 contiguous bytes? Some sort of defragmentation process?
> Sadly not. Remember earlier we talked about how the return value of malloc is the address of a byte in memory? Moving allocations won't change the pointers we have already returned from malloc. We would change the value those pointers are pointed at, effectively breaking them. This is one of the downsides of the malloc/free API.
But why not? Couldn't we store information about old pointers somewhere and match them with new addresses when defragmenting? Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs? Or would it be too much overhead for too little benefit?
by junon on 5/22/23, 12:16 PM
by a257 on 5/22/23, 4:25 PM
by cinntaile on 5/22/23, 12:29 PM
by SV_BubbleTime on 5/22/23, 3:20 PM
by jackphilson on 5/23/23, 5:26 AM
by delta_p_delta_x on 5/22/23, 2:35 PM
by codr7 on 5/23/23, 1:11 AM
malloc/free: 207294429ns
slab: 74795526ns
A 74% reduction in runtime is pretty nice.
by robjwells on 5/22/23, 1:39 PM
by duped on 5/22/23, 2:13 PM
by winrid on 5/22/23, 9:10 PM
There's something refreshing about firing up a C (or maybe now Zig, in my case) compiler and allocating a gigabyte of memory, and seeing that your process is using exactly 1GB.
by fmiras on 5/22/23, 12:10 PM
by alberth on 5/22/23, 2:12 PM
It would make the site much more accessible and clear in the event you didn't realize you had to click forward.
by KevinChen6 on 5/26/23, 3:36 AM
by NeuroCoder on 5/23/23, 4:57 AM
by syngrog66 on 5/22/23, 11:51 PM
by shepherdjerred on 5/22/23, 2:18 PM
by markus_zhang on 5/22/23, 6:27 PM
by kayo_20211030 on 5/22/23, 2:17 PM
by armatav on 5/22/23, 10:49 PM
by dexter89_kp3 on 5/22/23, 9:58 PM
by bjt2n3904 on 5/22/23, 2:08 PM
It instills the idea that you should be asking a question at this point. Some information has been provided that should generate more questions in your head, if you're keeping pace.