by Rendello on 6/14/25, 3:31 AM with 35 comments
by burntsushi on 6/14/25, 12:15 PM
The actual implementation is here: https://github.com/BurntSushi/memchr?tab=readme-ov-file#algo...
The main difference with the algorithm presented here is that instead of always using the first and last bytes of the needle, a heuristic is used to try to pick two bytes that occur less frequently according to a background frequency distribution.
It ends up being quite a bit faster than just plain Two-Way or even GNU libc's implementation of `memmem`. From the root of the `memchr` repository:
$ rebar rank benchmarks/record/x86_64/2023-12-29.csv -e '^rust/memchr/memmem/(oneshot|prebuilt|twoway)' -e '^libc/memmem/oneshot'
Engine Version Geometric mean of speed ratios Benchmark count
------ ------- ------------------------------ ---------------
rust/memchr/memmem/prebuilt 2.7.1 1.07 57
rust/memchr/memmem/oneshot 2.7.1 1.39 54
libc/memmem/oneshot unknown 3.15 54
rust/memchr/memmem/twoway 2.7.1 5.26 54
The "prebuilt" benchmarks also demonstrate the API deficiencies of routines like `memmem`. Because of their API, they need to rebuild the state necessary to execute a search for the needle given. But it is often the case that the needle is invariant across repeated calls with different haystacks. So you end up doing that repeated work for no gain.by ncruces on 6/14/25, 7:15 AM
https://github.com/ncruces/go-sqlite3/tree/main/sqlite3/libc
For known haystack lengths, and large needles, I found it useful to augment the approach with Quick Search.
by mwsherman on 6/14/25, 5:34 PM
by clauderoux on 6/14/25, 7:46 AM
by klaussilveira on 6/14/25, 4:21 PM
Which uses SIMD algos for fast HTTP parsing.
by eska on 6/14/25, 11:22 AM
by sylware on 6/14/25, 10:11 AM
Because the SIMD (aligned, size computations, etc) setup code is usually more expensive that byte-oriented basic search for "small" strings.
Yes, it depends heavily on the hardware architecture.
by rurban on 6/14/25, 9:25 AM
Now we just need a name, and we can add it to the smart shootout. Wonder how it compares to the best SIMD algos
by mgaunard on 6/14/25, 9:50 AM
by azhenley on 6/14/25, 1:23 PM