from Hacker News

Uninitialized memory: Unsafe Rust is too hard

by drrlvn on 1/30/22, 10:51 AM with 125 comments

  • by notpopcorn on 1/30/22, 3:46 PM

    Without any unsafe code this is simply:

        let role = Role {
            name: "basic",
            flag: 1,
            disabled: false,
        };
    
    The language tries to prevent you from interacting with a `Role` object that's not fully initialized. `mem::zero()` could work, but then you'll have to turn the `&'static str` into an `Option<&'static str>` or a raw pointer, to indicate that it might be null. You could also add `#[derive(Default)]` to the struct, to automatically get a `Role::default()` function to create a `Role` with and then modify the fields afterwards, if you want to set the fields in separate statements for some reason:

        let mut role = Role::default();
        role.name = "basic";
        role.flag = 1;
        role.disabled = false;
    
    And even with `MaybeUninit` you can initialize the whole struct (without `unsafe`!) with `MaybeUninit::write`. It's just that partially initializing something is hard to get right, which is the point of the article I guess. But I wonder how commonly you would really want that, as it easily leads to mistakes.
  • by jcranmer on 1/30/22, 4:23 PM

    Here's another perspective on why things are the way they are:

    One of the central philosophies of Rust is that it should not be possible to execute undefined behavior using only safe code. Rust's underlying core semantics end up being very similar to C's semantics, at least in terms of where undefined behavior can arise, and we can imagine Rust's references as being wrappers around the underlying pointer type that have extra requirements to ensure that they can be safely dereferenced in safe code without ever causing UB.

    So consider a simple pointer dereference in C (*p)... how could that cause UB? Well, the obvious ones are that the pointer could be out-of-bounds or pointing to an expired memory location. So references (& and &mut) most point to a live memory location, even in unsafe code. Also pretty obviously, the pointer would be UB were it unaligned, so a Rust reference must be properly aligned.

    Another one that should be familiar from the C context is that the memory location must be initialized. So the & reference in Rust means that the memory location must also be initialized... and since &mut implies &, so must &mut. This part is probably genuinely surprising, since it's a rule that doesn't apply to C.

    The most surprising rule that applies here as well is that the memory location cannot be a trap representation (to use C's terminology). Yes--C has the same requirement here, but most people probably don't come across a platform that has trap representations in C. The reason why std::mem::uninitialized was deprecated in favor of MaybeUninit was that Rust has a type all of whose representations are trap representation (that's the ! type).

    In short, the author is discovering two related issues here. First, the design of Rust is to push all of the burden of undefined behavior into unsafe code blocks, and the downside of that is that most programmers probably aren't sufficiently cognizant of UB rules to do that rule. Rust also pushes the UB of pointers to reference construction, whereas C makes most of its UB happen only on pointer dereference (constructing unaligned pointers being the exception).

    The second issue is that Rust's syntax is geared to making safe Rust ergonomic, not unsafe Rust. This means that using the "usual" syntax rules in unsafe Rust blocks is more often than not UB, even when you're trying to avoid the inherent UB construction patterns. Struct projection (given a pointer/reference to a struct, get a pointer/reference to a field) is especially implicated here.

    These combine when you deal with uninitialized memory references. This is a reasonably common pattern, but designing an always-safe abstraction for uninitialized memory is challenging. And Rust did screw this up, and the stability guidelines means the bad implementations are baked in for good (see, e.g., std::io::Read).

  • by notpopcorn on 1/30/22, 3:56 PM

    > So we use a &'static str here instead of a C string so there are some changes to the C code.

    > [..]

    > So why does this type not support zero initialization? What do we have to change? Can zeroed not be used at all? Some of you might think that the answer is #[repr(C)] on the struct to force a C layout but that won't solve the problem.

    The type of the first field was switched to a type (&str) that specifically promises it is never null. If the original type (a pointer) was kept, or a Option<&str> was used, mem::zero would've worked fine.

  • by duped on 1/30/22, 4:45 PM

    To the OP - why should creating uninitialized references with static lifetimes be easy? That is a recipe for undefined behavior - borrows aren't pointers, if you want a pointer to be zero initialized, then use a pointer.

    If you want safe access to that pointer then wrap it in a struct with an accessor method

  • by staticassertion on 1/30/22, 5:24 PM

    I think that the premise here is correct - writing unsafe Rust is too hard. There are lots of footguns.

    This isn't a very good motivating example but I suppose it does the job of showing the various hoops one has to jump through when using unsafe.

    I think right now the approach is to make unsafe "safe" (ie std::mem::uninitialized -> MaybeUninit) at the cost of complex, and eventually to build out improved helpers and abstractions. Obviously this is still ongoing.

    But also, just don't write unsafe? It's very easy to avoid.

  • by dathinab on 1/30/22, 5:04 PM

    The scary thing is:

    Handling uninitialized memory is hard in C++ (and C), too.

    You just don't notice and accidentally do it slightly wrong (mainly in C++, in C it's harder to mess up).

  • by jcranmer on 1/30/22, 3:10 PM

    In C, when you declare 'struct role r' (not as a static variable), it is not zeroed. The immediate Rust equivalent would be to use std::mem::uninitialized(), not std::mem::zeroed.
  • by andreareina on 1/30/22, 2:06 PM

    I don't know rust, but why isn't the answer, don't try to do what you'd do in C like construct uninitialized structs?
  • by Ericson2314 on 1/30/22, 5:58 PM

    The write_unaligned is pure FUD. Regular unpacked structs don't violate alignment on fields!
  • by sharikous on 1/30/22, 3:16 PM

    What is the reason for the rule objects have to be always in a good state even inside unsafe?
  • by LinAGKar on 1/30/22, 11:21 PM

    But the code isn't equivalent. The C code just has a pointer to a manually allocated buffer, while the Rust does the equivalent of zeroing out (or leaving uninitialized) a C++ std::string. Akin to:

    auto name = reinterpret_cast<std::string >(malloc(sizeof(std::string))); memset(name, 0, sizeof(std::string); *name = "basic";

    But on the stack.

  • by eddyb on 1/30/22, 6:13 PM

    > For instance `(*role).name` creates a `&mut &'static str` behind the scenes which is illegal, even if we can't observe it because the memory where it points to is not initialized.

    Where is this coming from? It's literally not true. The MIR for this has:

            ((*_3).0: &str) = const "basic";
            ((*_3).2: u32) = const 1_u32;
            ((*_3).1: bool) = const false;
    
    So it's only going to do a raw offset and then assign to it, which is identical to `*ptr::addr_of_mut!((*role).field) = value`.

    Sadly there's no way to tell miri to consider `&mut T` valid only if `T` is valid (that choice is not settled yet, AFAIK, at the language design level), in order to demonstrate the difference (https://github.com/rust-lang/miri/issues/1638).

    The other claim, "dereferencing is illegal", is more likely, but unlike popular misconception, "dereference" is a syntactic concept, that turns a (pointer/reference) "value" into a "place".

    There's no "operation" of "dereference" to attach dynamic semantics to. After all, `ptr::addr_of_mut!(*p).write(x)` has to remain as valid as `p.write(x)`, and it does literally contain a "dereference" operation (and so do your field projections).

    So it's still inaccurate. I believe what you want is to say that in `place = value` the destination `place` has to hold a valid value, as if we were doing `mem::replace(&mut place, value)`. This is indeed true for types that have destructors in them, since those would need to run (which in itself is why `write` on pointers exists - it long existed before any of the newer ideas about "indirect validity" in recent years).

    However, you have `Copy` types there, and those are definitely not different from `<*mut T>::write` to assign to, today. I don't see us having to change that, but I'm also not seeing any references to where these ideas are coming from.

    > I'm pretty sure we can depend on things being aligned

    What do you mean "pretty sure"? Of course you can, otherwise it would be UB to allow safe references to those fields! Anything else would be unsound. In fact, this goes hand in hand with the main significant omission of this post: this is not how you're supposed to use `MaybeUninit`.

    All of this raw pointer stuff is a distraction from the fact that what you want is `&mut MaybeUninit<FieldType>`. Then all of the things about reference validity are necessarily true, and you can safely initialize the value. The only `unsafe` operation in this entire blog post, that isn't unnecessarily added in, is `assume_init`.

    What the author doesn't mention is that Rust fails to let you convert between `&mut MaybeUninit<Struct>` and some hypothetical `&mut StructBut<replace Field with MaybeUninit<Field>>` because the language isn't powerful enough to do it automatically. This was one of the saddest things about `MaybeUninit` (and we tried to rectify it for at least arrays).

    This is where I was going to link to a custom derive that someone has written to generate that kind of transform manually (with the necessary check for safe field access wrt alignment). To my shock, I can't find one. Did I see one and did it have a funny name? (the one thing I did find was a macro crate but unlike a derive those have a harder time checking everything so I had to report https://github.com/youngspe/project-uninit/issues/1)

  • by mlindner on 1/30/22, 8:07 PM

    This author doesn't even seem to know C properly so it's hard to accept their reasoning.

    This in Rust:

      let mut role: Role = mem::zeroed();
    
    Is not the same as this in C:

      struct role r;
    
    C does not zero initialize.
  • by sAbakumoff on 1/30/22, 3:28 PM

    I am under the impression that even _safe_ Rust is really hard to learn. Several years ago I started with GoLang and it was so easy to start programming even advanced things almost instantly..Rust drives me crazy. The syntax seems overcomplicated, the compiler errors are cryptic, the IDE is not helpful.
  • by remram on 1/30/22, 6:05 PM

    > Because that raw pointer does not implement deref and because Rust has no -> operator we now need to dereference the pointer permanently to assign the fields with that awkward syntax.

    Absolutely not, you can still use a mutable reference:

        let role = &mut *uninit.as_mut_ptr();
        role.name = "basic";