by dchest on 8/3/24, 2:45 PM with 399 comments
by josephcsible on 8/3/24, 3:09 PM
But that's not an excuse for having a bug; it's the exact evidence that it's not a bug at all. Calling the compiler buggy for not doing what you want when you commit Undefined Behavior is like calling dd buggy for destroying your data when you call it with the wrong arguments.
by gumby on 8/3/24, 7:47 PM
A big chunk of the essay is about a side point — how good the gains of optimization might be, which, even with data, would be a use-case dependent decision.
But the bulk of his complaint is that C compilers fail to take into account semantics that cannot be expressed in the language. Wow, shocker!
At the very end he says “use a language which can express the needed semantics”. The entire essay could have been replaced with that sentence.
by leni536 on 8/3/24, 4:56 PM
But blaming the compiler devs for this is just misguided.
by amluto on 8/3/24, 5:37 PM
https://www.intel.com/content/www/us/en/developer/articles/t...
Look at DOITM in that document — it is simply impossible for a userspace crypto library to set the required bit.
by dathinab on 8/3/24, 4:55 PM
I have seldomly seen someone discredit their expertise that fast in a blog post. (Especially if you follow the link and realized it's just basic fundamental C stuff of UB not meaning it produces an "arbitrary" value.)
by Conscat on 8/3/24, 4:10 PM
by TNorthover on 8/3/24, 4:20 PM
They'll probably need some kind of specialized compiler of their own if they want to be serious about it. Or carry on with asm.
by kstrauser on 8/3/24, 4:28 PM
For instance, in Python you can write something like:
result = [something(value) for value in set_object]
Because Python's set objects are unordered, it's clear that it doesn't matter in which order the items are processed, and that the order of the results doesn't matter. That opens a whole lot of optimizations at the language level that don't rely on brilliant compilers implying what the author meant. Similar code in another language with immutable data can go one step further: since something(value1) can't possibly affect something(value2), it can execute those in parallel with threads or processes or whatever else makes it go fast.Much of the optimization of C compilers is looking at patterns in the code and trying to find faster ways to do what the author probably meant. Because C lacks the ability to express much intent compared to pretty much any newer language, they have the freedom to guess, but also have to make those kinds of inferences to get decent performance.
On the plus side, this might be a blessing in disguise like when the Hubble telescope needed glasses. We invented brilliant techniques to make it work despite its limitations. Once we fixed its problems, those same techniques made it perform way better than originally expected. All those C compiler optimizations, applied to a language that's not C, may give us superpowers.
by AndyKelley on 8/3/24, 6:54 PM
by krackers on 8/3/24, 6:58 PM
by zokier on 8/3/24, 5:32 PM
by lapinot on 8/4/24, 9:55 AM
by pcwalton on 8/4/24, 12:47 PM
Wildly false, and I have no idea where the author is getting this idea from. If you regress people's code in LLVM, your patch gets reverted.
by quohort on 8/3/24, 7:45 PM
Before reading this, I thought that a simple compiler could never usefully compete against optimizing compilers (which require more manpower to produce), but perhaps there is a niche use-case for a compiler with better facilities for manual optimization. This article has inspired me to make a simple compiler myself.
by ziml77 on 8/3/24, 4:17 PM
by account42 on 8/5/24, 11:18 AM
Compilers are not your enemy. Optimizing compilers do the things they do because that's what the majority of people using them want.
It also mixes in things that have nothing to do with optimizing compilers at all like expecting emulation of 64-bit integers on 32-bit platforms to be constant time when neither the language nor the library in question have ever promised such guarantees. Similar with the constant references to bool as if that's some kind of magical data type where avoiding it gives you whatever guarantees you wish. Sounds more like magical thinking than programming.
I'd file this under "why can't the compiler read my mind and do what I want instead of just what I asked it to".
by Retr0id on 8/3/24, 5:36 PM
The "compiler"'s job would then be to assert that the behaviour of the source matches the behaviour of the provided assembly. (This is probably a hard/impossible problem to solve in the general case, but I think it'd be solvable in enough cases to be useful)
To me this would offer the best of both worlds - readable, auditable source code, alongside high-performance assembly that you know won't randomly break in a future compiler update.
by afdbcreid on 8/4/24, 1:38 PM
> LLVM 11 tends to take 2x longer to compile code with optimizations, and as a result produces code that runs 10-20% faster (with occasional outliers in either direction), compared to LLVM 2.7 which is more than 10 years old.
Yes, C code is expected to benefit less from optimizations, since it is already close to assembly. But compiler optimizations in the past decades had enormous impact - because they allowed better languages. Without modern optimizations, C++ would have never been as fast as C, and Rust wouldn't be possible at all. Same arguments apply to Java and JavaScript.
by mgaunard on 8/4/24, 4:08 PM
char* strappend(char const* input, size_t size) {
char* ptr = malloc(size + 2);
if (!ptr) return 0;
memcpy(ptr, input, size);
ptr[size] = 'a';
ptr[size + 1] = 'b';
return ptr;
}
This function is undefined if size is SIZE_T_MAX.Many pieces of code have these sorts of "bugs", but in practice no one cares, because the input required, while theoretically possible, physically is not.
by saagarjha on 8/3/24, 4:36 PM
by wolf550e on 8/3/24, 5:45 PM
I bet it's roughly none.
by inglor_cz on 8/4/24, 7:36 AM
And both are just a major headache now, and belong to reasons why few people start new projects in C.
I wonder how many such design decisions, relevant today, but with a potential to screw up future humanity, we are making right now.
by jancsika on 8/4/24, 1:48 AM
How do Firefox and Chrome perform if they are compiled at -O0?
by quuxplusone on 8/3/24, 5:08 PM
Basically like today's "-Og/-Odebug" or "-fno-omit-frame-pointers" but for this specific niche.
I'd be interested to see a post comparing the performance and vulnerability of the mentioned crypto code with and without this (hypothetical) -Obranchless.
by fhgag on 8/4/24, 11:27 AM
#pragma GCC push_options #pragma GCC optimize ("O0")
Exploiting UB in the optimizer can be annoying, but most projects with bad practices from the 1990s have figured it out by now. UBsan helps of course.I'm pretty grateful for aggressive optimizations. I would not want to compile a large C++ codebase with g++ that has itself been compiled with -O0. Even a 20% speedup helps.
The only annoying issue with C/C++ compilers is the growing list of false positive warnings (usually 100% false positives in well written projects).
by johnfn on 8/3/24, 4:39 PM
This makes it difficult to read the rest of the article. Really? All compiler authors, as a blanket statement, act in bad faith? Whenever possible?
> As a cryptographic example, benchmarks across many CPUs show that the avx2 implementation of kyber768 is about 4 times faster than portable code compiled with an "optimizing" compiler.
What? This is an apples to oranges comparison. Compilers optimize all code they parse; optimizing a single algorithm will of course speed up implementations of that specific algorithm, but what about the 99.9999999% of code which is not your particular hand-optimized algorithm?
by GTP on 8/4/24, 8:58 AM
C has also other issues related to undefined behavior and it being used for what I call "extreme optimizations" (e.g. not emitting code for an if branch that checks for a null pointer). Rust is emerging as an alternative to C that aims to fix many of its problems, but how does it fares in terms of writing constant-time code? Is it similar to C, easier or more complicated?
by qalmakka on 8/4/24, 10:31 AM
by _orz_ on 8/4/24, 12:37 AM
Both, gcc and clang, are orders of magnitude better tested than all the closed source applications, developed under tight timelines and that we essentially trust our lives with.
To be very clear, there are compiler bugs but those are almost never the problem in the first place. In the vast majority of cases it starts with buggy user code. An now back to handwritten assembly…
by gok on 8/4/24, 1:53 AM
by red_admiral on 8/4/24, 8:18 AM
by tomcam on 8/3/24, 5:53 PM
Somehow it took me long minutes to infer this.
by e40 on 8/4/24, 1:47 PM
by orf on 8/4/24, 11:24 AM
Find out on next weeks episode of “lets blame compilers rather than my choice of language”!
by ndesaulniers on 8/3/24, 4:13 PM
by o11c on 8/3/24, 5:09 PM