by enz on 1/8/24, 5:39 PM with 6 comments
by kevmo314 on 1/8/24, 7:57 PM
It turns out, with some more research (https://github.com/golang/go/issues/25819), that the function was added not to guarantee performance but to guarantee precision, namely that fused mutiply and add yields higher precision than doing the operations stepwise and in certain situations you'd like to guarantee precision. Which is cool, but absolutely not what I would've guessed on first read, and the first commenter also closed the issue with the same take!
So I was able to successfully counterpoint using math.FMA() as a performance optimization and maybe a small personal takeaway to not optimize unless I really know what the thing is doing.
by wahern on 1/8/24, 8:16 PM
by perryizgr8 on 1/8/24, 10:37 PM
MOVBU internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
CBZ R4, load_store_loop
Why is this a runtime decision? Shouldn't the compiler know if the target machine supports the instruction or not?