from Hacker News

How do modern compilers choose which variables to put in registers?

by azeemba on 2/14/25, 1:30 PM with 65 comments

by alexjplant on 2/17/25, 6:45 AM
This is perhaps my favorite Stack Overflow answer of all time. I don't remember when I last saw such an approachable explanation of something so critical yet complicated.
> One of the canonical approaches, graph coloring, was first proposed in 1981.
This is about as far as my professor took this topic in class ~13 years ago. Nevertheless the slides that he used to illustrate how the graph coloring problem applied to register allocation stick with me to this day as one of the most elegant applications of CS I've ever seen (which are admittedly few as I'm not nearly as studied as I ought to be).
> Code generation is a surprisingly challenging and underappreciated aspect of compiler implementation, and quite a lot can happen under the hood even after a compiler’s IR optimization pipeline has finished. Register allocation is one of those things, and like many compilers topics, entire books could be written on it alone.
Our class final project targeted a register-based VM [1] for this exact reason. I also found writing MIPS assembly simpler than Y86 [2] because of the larger number of registers at my disposal (among the other obvious differences).
[1] https://www.lua.org/doc/jucs05.pdf
[2] https://esolangs.org/wiki/Y86
by jrimbault on 2/17/25, 7:53 AM
It doesn't surprise me much that this was written by the same author as "Parse, don't validate", very well written
[0]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
[1]: all previous threads https://news.ycombinator.com/from?site=lexi-lambda.github.io
by WalterBright on 2/17/25, 7:03 PM
The Digital Mars D compiler register allocator:
1. intermediate code is basic blocks connected with edges. Each block has a bit vector the size of which is the number of local variables. If a variable is referenced in a basic block, the corresponding bit is set.
2. basic blocks are sorted in depth first order
3. variables are sorted by "weight", which is incremented for each use, incremented by 10 for each use in a loop, by 100 for each use in a nested loop, etc.
4. code is generated with nothing assigned to registers, but registers used are marked in a bit vector, one per basic block
5. Now the register allocator allocates registers unused in a basic block to variables that are used in the basic block, in the order of the weights
6. Assigning variables to registers often means less registers are used for code generation, so more registers become available, so the process is done again until no more registers can be assigned
There are more nuances, such as variables passed to a function via registers, which introduces complications - should it stay in a register, or be moved into memory? But dealing with that is why I get paid the Big Bucks.
by kragen on 2/17/25, 11:26 PM
It's surprising that such a high-quality answer lacks a mention of the polynomial-time optimal register allocation algorithms in common use in current compilers. It's true that graph coloring is NP-complete, so you can't solve graph coloring in polynomial time, but graph coloring is register allocation with an additional constraint added: that you can never move a variable from one register to another.
Removing that constraint turns out to allow guaranteed polynomial time solutions, and that result is now widely used in practice.
by artemonster on 2/17/25, 8:42 AM
Can anyone with a good knowledge explain why aren't we also explicitly (with the help of compiler) managing cache as well? Register allocation is basically lowest form of cache on which you can operate on with most efficiency. Why on the next level we rely on our hardware to do the guesswork for us?
by kibwen on 2/17/25, 4:16 PM
I'd be curious to see a "high-level structured assembly language" that gives the programmer actual control over things like register allocations, in a more systematic way than C's attempt. You might say "optimizers will do a better job", and you're right, but what I want to see is a language that isn't designed to fed into an optimizing backend at all, but turned into machine code via simple, local transformations that a programmer can reliably predict. In other words, as high-level systems languages like C lean more and more heavily on optimizing backends and move away from being "portable assembly", I think that opens up a conceptual space somewhere below C yet still above assembly.
by userbinator on 2/17/25, 7:18 AM
I find it a little amusing that the given example function does a computation which could've easily been simplified into a single instruction on x86, which needs only a single register (assuming the input x and the return value are both in eax):
```
    lea eax, [eax+eax*4+7]
```
...and it's likely that a compiler would do such simplifications even before attempting register allocation, since it can only make the latter easier.
This is particularly likely to be necessary if the desired instruction is already using indirect addressing, and both register allocation and instruction selection must take those constraints into account.
As a long-time Asm programmer, I believe that instruction selection and register allocation are inseparable and really need to be considered at the same time; attempting to separate them, like what most if not all compilers do, results in (sometimes very) suboptimal results which is easily noticeable in compiler-generated vs human-generated code.
by Aardwolf on 2/17/25, 1:54 PM
Perhaps this is layman's understanding, but afaik all CPU operations are only done on registers, so any variable has to be moved to registers when it's operated on (or are some operations possible on memory without going to registers?) and so the answer to "which variables" would be "all of them".
Or is the question about which variables are kept in registers for a bit longer time even while they're not actively being computed on right now?
by dapperdrake on 2/17/25, 3:17 PM
Registers vs x87 stack with free swap a.k.a. rotating stack. Sub-optimal edge cases for both exist:
[1] http://cr.yp.to/qhasm/20050210-fxch.txt
[2] https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-co...
by travisgriggs on 2/17/25, 3:40 PM
Reading the top voted answer made me feel like I was reading one if Jon Hannibal Stokes CPU praxis write ups from the early years of ARS. Anyone else remember those?