from Hacker News

Zlib-rs is faster than C

by dochtman on 3/16/25, 7:35 PM with 473 comments

  • by brianpane on 3/17/25, 10:06 AM

    I contributed a number of performance patches to this release of zlib-rs. This was my first time doing perf work on a Rust project, so here are some things I learned: Even in a project that uses `unsafe` for SIMD and internal buffers, Rust still provided guardrails that made it easier to iterate on optimizations. Abstraction boundaries helped here: a common idiom in the codebase is to cast a raw buffer to a Rust slice for processing, to enable more compile-time checking of lifetimes and array bounds. The compiler pleasantly surprised me by doing optimizations I thought I’d have to do myself, such as optimizing away bounds checks for array accesses that could be proven correct at compile time. It also inlined functions aggressively, which enabled it to do common subexpression elimination across functions. Many times, I had an idea for a micro-optimization, but when I looked at the generated assembly I found the compiler had already done it. Some of the performance improvements came from better cache locality. I had to use C-style structure declarations in one place to force fields that were commonly used together to inhabit the same cache line. For the rare cases where this is needed, it was helpful that Rust enabled it. SIMD code is arch-specific and requires unsafe APIs. Hopefully this will get better in the future. Memory-safety in the language was a piece of the project’s overall solution for shipping correct code. Test coverage and auditing were two other critical pieces.
  • by YZF on 3/16/25, 8:12 PM

    I found out I already know Rust:

            unsafe {
                let x_tmp0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x10);
                xmm_crc0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x01);
                xmm_crc1 = _mm_xor_si128(xmm_crc1, x_tmp0);
                xmm_crc1 = _mm_xor_si128(xmm_crc1, xmm_crc0);
    
    Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?

    Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?

  • by johnisgood on 3/16/25, 8:00 PM

    "faster than C" almost always boils down to different designs, implementations, algorithms, etc.

    Perhaps it is faster than already-existing implementations, sure, but not "faster than C", and it is odd to make such claims.

  • by cb321 on 3/16/25, 8:32 PM

    I think this may not be a very high bar. zippy in Nim claims to be about 1.5x to 2.0x faster than zlib: https://github.com/guzba/zippy I think there are also faster zlib's around in C than the standard install one, such as https://github.com/ebiggers/libdeflate (EDIT: also mentioned elsethread https://news.ycombinator.com/item?id=43381768 by mananaysiempre)

    zlib itself seems pretty antiquated/outdated these days, but it does remain popular, even as a basis for newer parallel-friendly formats such as https://www.htslib.org/doc/bgzip.html

  • by jrockway on 3/16/25, 8:47 PM

    Chromium is kind of stuck with zlib because it's the algorithm that's in the standards, but if you're making your own protocol, you can do even better than this by picking a better algorithm. Zstandard is faster and compresses better. LZ4 is much faster, but not quite as small.

    Some reading: https://jolynch.github.io/posts/use_fast_data_algorithms/

    (As an aside, at my last job container pushes / pulls were in the development critical path for a lot of workflows. It turns out that sha256 and gzip are responsible for a lot of the time spent during container startup. Fortunately, Zstandard is allowed, and blake3 digests will be allowed soon.)

  • by IshKebab on 3/16/25, 7:55 PM

    It's barely faster. I would say it's more accurate to say it's as fast as C, which is still a great achievement.
  • by 1vuio0pswjnm7 on 3/17/25, 3:05 AM

    Which library compiles faster.

    Which library has fewer dependencies.

    Is each library the same size. Which one is smaller.

  • by miki123211 on 3/17/25, 6:16 PM

    I think performance is an underappreciated benefit of safe languages that compile to machine code.

    If you're writing your program in C, you're afraid of shooting yourself in the foot and introducing security vulnerabilities, so you'll naturally tend to avoid significant refactorings or complicated multithreading unless necessary. If you have Rust's memory safety guarantees, Go's channels and lightweight goroutines, or the access to a test runner from either of those languages, that's suddenly a lot less of a problem.

    The compiler guarantees you get won't hurt either. Just to give a simple example, if your Rust function receives an immutable reference to a struct, it can rely on the fact that a member of that struct won't magically be mutated by a call to some random function through spooky action at a distance. It can just keep it on the stack / in a callee-saved register instead of fetching it from memory at every loop iteration, if that's more optimal.

    Then there's the easy access to package ecosystems and extensive standard libraries. If there's a super popular do_foo package, you can almost guarantee that it was a bottleneck for somebody at some point, so it's probably optimized to hell and back. It's certainly more optimized than your simple 10-line do_foo function that you would have written in C, because that's easier than dealing with yet another third-party library and whatever build system it uses.

  • by throwaway2037 on 3/17/25, 3:54 AM

    Does this performance have anything to do with Rust itself, or is it just more optimized than the other C-language versions (more SIMD instructions / raw assembly code)? I ask because there is a canonical use case where C++ can consistently outperform C -- sorting, because the comparison operator in C++ allows for more compiler optimization compared to the C version: qsort(). I am wondering if there is something similar here for Rust vs C.
  • by up2isomorphism on 3/17/25, 5:31 AM

    Rust folks love compare rust to C but C folks seldom compare C to rust.
  • by Georgelemental on 3/17/25, 7:22 PM

    > The C code is able to use switch implicit fallthroughs to generate very efficient code. Rust does not have an equivalent of this mechanism

    Rust very much can emulate this, with `break` + nested blocks. But not if you also add in `goto` to previous branches

  • by CyberDildonics on 3/16/25, 11:49 PM

    If you're dealing with a compiled system language the language is going to make almost no difference in speed, especially if they are all being optimized by LLVM.

    An optimized version that controls allocations, has good memory access patterns, uses SIMD and uses multi-threading can easily be 100x faster or more. Better memory access alone can speed a program up 20x or more.

  • by quotemstr on 3/17/25, 9:15 AM

    New native code implementation of zlib faster than old native code version. So what? Rust has a lot of recommend it, but it's not automatically faster than C.
  • by randomNumber7 on 3/17/25, 9:35 PM

    Finally, now is the day - today - where rust is faster than C
  • by kahlonel on 3/16/25, 8:04 PM

    You mean the implementation is faster than the one in C. Because nothing is “faster than C”.
  • by akagusu on 3/16/25, 10:37 PM

    Bravo. Now Rust has its existence justified.