by blobcode on 5/31/25, 7:05 AM with 224 comments
by orlp on 5/31/25, 8:59 AM
These operations are
1. Localized, not a function-wide or program-wide flag.
2. Completely safe, -ffast-math includes assumptions such that there are no NaNs, and violating that is undefined behavior.
So what do these algebraic operations do? Well, one by itself doesn't do much of anything compared to a regular operation. But a sequence of them is allowed to be transformed using optimizations which are algebraically justified, as-if all operations are done using real arithmetic.
by smcameron on 5/31/25, 2:16 PM
feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
will cause your code to get a SIGFPE whenever a NaN crawls out from under a rock. Of course it doesn't work with fast-math enabled, but if you're unknowingly getting NaNs without fast-math enabled, you obviously need to fix those before even trying fast-math, and they can be hard to find, and feenableexcept() makes finding them a lot easier.by emn13 on 5/31/25, 10:34 AM
Not being able to auto-vectorize seems like a pretty critical bug given hardware trends that have been going on for decades now; on the other hand sacrificing platform-independent determinism isn't a trivial cost to pay either.
I'm not familiar with the details of OpenCL and CUDA on this front - do they have some way to guarrantee a specific order-of-operations such that code always has a predictable result on all platforms and nevertheless parallelizes well on a GPU?
by Sharlin on 5/31/25, 10:19 AM
What's wrong with fun, safe math optimizations?!
(:
by teleforce on 5/31/25, 1:29 PM
Is there any IEEE standards committee working on FP alternative for examples Unum and Posit [1],[2].
[1] Unum & Posit:
[2] The End of Error:
https://www.oreilly.com/library/view/the-end-of/978148223986...
by storus on 5/31/25, 1:08 PM
by Sophira on 5/31/25, 7:50 AM
by leephillips on 5/31/25, 7:45 PM
“The problem is how FTZ actually implemented on most hardware: it is not set per-instruction, but instead controlled by the floating point environment: more specifically, it is controlled by the floating point control register, which on most systems is set at the thread level: enabling FTZ will affect all other operations in the same thread.
“GCC with -funsafe-math-optimizations enables FTZ (and its close relation, denormals-are-zero, or DAZ), even when building shared libraries. That means simply loading a shared library can change the results in completely unrelated code, which is a fun debugging experience.”
by cycomanic on 5/31/25, 11:24 AM
I particularly find the discussion of - fassociative-math because I assume that most writers of some code that translates a mathetical formula to into simulations will not know which would be the most accurate order of operations and will simply codify their derivation of the equation to be simulated (which could have operations in any order). So if this switch changes your results it probably means that you should have a long hard look at the equations you're simulating and which ordering will give you the most correct results.
That said I appreciate that the considerations might be quite different for libraries and in particular simulations for mathematics.
by chuckadams on 5/31/25, 1:40 PM
by datameta on 5/31/25, 1:35 PM
by zinekeller on 5/31/25, 9:40 AM
Previous discussion: Beware of fast-math (Nov 12, 2021, https://news.ycombinator.com/item?id=29201473)
by quotemstr on 5/31/25, 11:49 AM
by Affric on 5/31/25, 10:16 AM
EDIT: I am now reading Goldberg 1991
Double edit: Kahan Summation formula. Goldberg is always worth going back to.
by hyghjiyhu on 5/31/25, 3:06 PM
by cbarrick on 5/31/25, 4:44 PM
Vivaldi 7.4.3691.52
Android 15; ASUS_AI2302 Build/AQ3A.240812.002
by boulos on 5/31/25, 5:28 PM
I'm surprised by the take that FTZ is worse than reassociation. FTZ being environmental rather than per instruction is certainly unfortunate, but that's true of rounding modes generally in x86. And I would argue that most programs are unprepared to handle subnormals anyway.
By contrast, reassociation definitely allows more optimization, but it also prohibits you from specifying the order precisely:
> Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result.
I haven't followed standards work in forever, but I imagine that the introduction of std::fma, gets people most of the benefit. That combined with something akin to volatile (if it actually worked) would probably be good enough for most people. Known, numerically sensitive code paths would be carefully written, while the rest of the code base can effectively be "meh, don't care".
by eqvinox on 5/31/25, 9:06 AM
by JKCalhoun on 5/31/25, 1:24 PM
> This is perhaps the single most frequent cause of fast-math-related StackOverflow questions and GitHub bug reports
The second line above should settle the first.
by dirtyhippiefree on 5/31/25, 5:21 PM
If it’s not always correct, whoever chooses to use it chooses to allow error…
Sounds worse than worthless to me.
by razighter777 on 5/31/25, 1:36 PM
by sholladay on 5/31/25, 3:20 PM
Make it work. Make it right. Make it fast.
by mg794613 on 5/31/25, 5:22 PM
Stop trying. Let their story unfold. Let the pain commence.
Wait 30 years and see them being frustrated trying to tell the next generation.
by rlpb on 5/31/25, 9:29 AM
A similar warning applies to -O3. If an optimization in -O3 were to reliably always give better results, it wouldn't be in -O3; it'd be in -O2. So blindly compiling with -O3 also doesn't seem like a great idea.
by bsenftner on 5/31/25, 12:17 PM