from Hacker News

Summing ASCII encoded integers on Haswell at almost the speed of memcpy

by iliekcomputers on 7/12/24, 4:42 PM with 36 comments

  • by ashleyn on 7/13/24, 2:13 PM

    Knew it'd be SIMD. Such an underrated feature of modern CPUs. Hopefully with cross-platform SIMD in Rust and Golang, it'll be more commonly used.

    Thinking parallel gets you enormous speed benefits for any number of arbitrary algorithms: https://mcyoung.xyz/2023/11/27/simd-base64/

  • by dist1ll on 7/13/24, 2:50 PM

    First time I hear about HighLoad. Seems really interesting to me on the first glance. I personally find SIMD and ISA/μarch-specific optimizations more rewarding than pure algorithmic challenges (codeforces and such).

    Though Haswell seems like a pretty obsolete platform to optimize for at this point. Even Skylake will be a decade old next year.

  • by wolf550e on 7/13/24, 1:18 PM

    I think the trick with dereferencing unmapped memory is cool, but I only really care about techniques that work reliably and I can use in production.
  • by raldi on 7/13/24, 5:00 PM

    Is there an explanation of why it sometimes gives the wrong answer?