from Hacker News

TIL: Go's CompareAndSwap is not always Compare-and-swap

by enz on 1/8/24, 5:39 PM with 6 comments

  • by kevmo314 on 1/8/24, 7:57 PM

    Related anecdote, a coworker suggested I use https://pkg.go.dev/math#FMA to optimize a multiply and add which surprised me quite a bit: why would there be an opt-in to fused multiply and add? Indeed, if you dive into the code (https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/...) it's quite a bit more complicated than your normal a*x+b syntax, so how could this possibly yield a performance improvement?

    It turns out, with some more research (https://github.com/golang/go/issues/25819), that the function was added not to guarantee performance but to guarantee precision, namely that fused mutiply and add yields higher precision than doing the operations stepwise and in certain situations you'd like to guarantee precision. Which is cool, but absolutely not what I would've guessed on first read, and the first commenter also closed the issue with the same take!

    So I was able to successfully counterpoint using math.FMA() as a performance optimization and maybe a small personal takeaway to not optimize unless I really know what the thing is doing.

  • by wahern on 1/8/24, 8:16 PM

    AFAIU, LL/SC is the more generic, powerful primitive. In theory LL/SC can be used as the hardware primitive for a much broader range of lock-free algorithms, as well as for software transactional memory generally. CAS algorithms are more commonly seen because it's the lowest common denominator, and the best x86 offered. But because of the limited number of addresses that can be monitored in hardware without sacrificing performance or efficiency, in practice LL/SC implementations are weak and only slightly more useful than [double] CAS.
  • by perryizgr8 on 1/8/24, 10:37 PM

    // Check support for LSE atomics

    MOVBU internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4

    CBZ R4, load_store_loop

    Why is this a runtime decision? Shouldn't the compiler know if the target machine supports the instruction or not?