by azeemba on 2/14/25, 1:30 PM with 65 comments
by alexjplant on 2/17/25, 6:45 AM
> One of the canonical approaches, graph coloring, was first proposed in 1981.
This is about as far as my professor took this topic in class ~13 years ago. Nevertheless the slides that he used to illustrate how the graph coloring problem applied to register allocation stick with me to this day as one of the most elegant applications of CS I've ever seen (which are admittedly few as I'm not nearly as studied as I ought to be).
> Code generation is a surprisingly challenging and underappreciated aspect of compiler implementation, and quite a lot can happen under the hood even after a compiler’s IR optimization pipeline has finished. Register allocation is one of those things, and like many compilers topics, entire books could be written on it alone.
Our class final project targeted a register-based VM [1] for this exact reason. I also found writing MIPS assembly simpler than Y86 [2] because of the larger number of registers at my disposal (among the other obvious differences).
by jrimbault on 2/17/25, 7:53 AM
[0]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
[1]: all previous threads https://news.ycombinator.com/from?site=lexi-lambda.github.io
by WalterBright on 2/17/25, 7:03 PM
1. intermediate code is basic blocks connected with edges. Each block has a bit vector the size of which is the number of local variables. If a variable is referenced in a basic block, the corresponding bit is set.
2. basic blocks are sorted in depth first order
3. variables are sorted by "weight", which is incremented for each use, incremented by 10 for each use in a loop, by 100 for each use in a nested loop, etc.
4. code is generated with nothing assigned to registers, but registers used are marked in a bit vector, one per basic block
5. Now the register allocator allocates registers unused in a basic block to variables that are used in the basic block, in the order of the weights
6. Assigning variables to registers often means less registers are used for code generation, so more registers become available, so the process is done again until no more registers can be assigned
There are more nuances, such as variables passed to a function via registers, which introduces complications - should it stay in a register, or be moved into memory? But dealing with that is why I get paid the Big Bucks.
by kragen on 2/17/25, 11:26 PM
Removing that constraint turns out to allow guaranteed polynomial time solutions, and that result is now widely used in practice.
by artemonster on 2/17/25, 8:42 AM
by kibwen on 2/17/25, 4:16 PM
by userbinator on 2/17/25, 7:18 AM
lea eax, [eax+eax*4+7]
...and it's likely that a compiler would do such simplifications even before attempting register allocation, since it can only make the latter easier.This is particularly likely to be necessary if the desired instruction is already using indirect addressing, and both register allocation and instruction selection must take those constraints into account.
As a long-time Asm programmer, I believe that instruction selection and register allocation are inseparable and really need to be considered at the same time; attempting to separate them, like what most if not all compilers do, results in (sometimes very) suboptimal results which is easily noticeable in compiler-generated vs human-generated code.
by Aardwolf on 2/17/25, 1:54 PM
Or is the question about which variables are kept in registers for a bit longer time even while they're not actively being computed on right now?
by dapperdrake on 2/17/25, 3:17 PM
[1] http://cr.yp.to/qhasm/20050210-fxch.txt
[2] https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-co...
by travisgriggs on 2/17/25, 3:40 PM