by nbrempel on 7/1/24, 4:29 PM with 42 comments
by oconnor663 on 7/1/24, 6:25 PM
by ww520 on 7/1/24, 9:37 PM
by thomashabets2 on 7/1/24, 6:14 PM
https://blog.habets.se/2024/04/Rust-is-faster-than-C.html and code at https://github.com/ThomasHabets/zipbrute/blob/master/rust/sr... showed me getting 3x faster using portable SIMD, on my first attempt.
by nbrempel on 7/1/24, 6:15 PM
One of my goals of writing these articles is to learn so feedback is more than welcome!
by eachro on 7/1/24, 5:57 PM
by anonymousDan on 7/1/24, 6:50 PM
by IshKebab on 7/1/24, 8:56 PM
There is also a traditional SIMD extension (P I think?) but it isn't finished. Most focus has been on the vector extension.
I am wondering how and if Rust will support these vector processing extensions.
by brundolf on 7/2/24, 5:33 AM
by neonsunset on 7/1/24, 7:52 PM
https://github.com/dotnet/runtime/blob/main/docs/coding-guid...
Here's an example of "checked" sum over a span of integers that uses platform-specific vector width:
https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...
Other examples:
CRC64 https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...
Hamming distance https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...
Default syntax is a bit ugly in my opinion, but it can be significantly improved with helper methods like here where the code is a port of simdutf's UTF-8 code point counting: https://github.com/U8String/U8String/blob/main/Sources/U8Str...
There are more advanced scenarios. Bepuphysics2 engine heavily leverages SIMD to perform as fast as PhysX's CPU back-end: https://github.com/bepu/bepuphysics2/blob/master/BepuPhysics...
Note that practically none of these need to reach out to platform-specific intrinsics (except for replacing movemask emulation with efficient ARM64 alternative) and use the same path for all platforms, varied by vector width rather than specific ISA.