by g0xA52A2A on 5/5/24, 9:28 AM with 20 comments
by nwellnhof on 5/5/24, 5:13 PM
by pbsd on 5/5/24, 9:00 PM
0x10000U >> ((0x1531U >> (i*5)) & 31);
On most current x86 chips this has a latency of 3 cycles -- LEA+SHR+SHR -- which is better than an L1 cache hit almost everywhere.by clausecker on 5/6/24, 10:42 AM
It's part of simdutf.
by masfuerte on 5/6/24, 3:59 PM