from Hacker News

Perhaps Rust Needs "Defer"

by broken_broken_ on 11/6/24, 8:30 AM with 67 comments

by pwdisswordfishz on 11/6/24, 10:07 AM
> So now I am confused, am I allowed to free() the Vec's pointer directly or not?
No, you are not; simple as that. Miri is right. Rust using malloc/free behind the scenes is an internal implementation detail you are not supposed to rely on. Rust used to use a completely different memory allocator, and this code would have crashed at runtime if it were still the case. Since when is undocumented information obtained from strace a stable API?
It's not like you can rely on Rust references and C pointers being identical in the ABI either, but the sample in the post blithely conflates them.
> It might be a bit surprising to a pure Rust developer given the Vec guarantees, but since the C side could pass anything, we must be defensive.
This is just masking bugs that otherwise could have been caught by sanitizers. Better to leave it out.
by haileys on 11/6/24, 10:03 AM
The way to do this in idiomatic Rust is to make a wrapper type that implements drop:
```
    struct MyForeignPtr(*mut c_void);

    impl Drop for MyForeignPtr {
        fn drop(&mut self) {
            unsafe { my_free_func(self.0); }
        }
    }
```
Then wrap the foreign pointer with MyForeignPtr as soon as it crosses the FFI boundary into your Rust code, and only ever access the raw pointer via this wrapper object. Don't pass the raw pointer around.
by klauserc on 11/6/24, 10:23 AM
Would `Vec::into_boxed_slice` [1] be the answer here? It gives you a `Box<[Foo]>`, which doesn't have a capacity (it still knows its own length).
1: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.int...
by pornel on 11/6/24, 11:34 AM
Fun fact: Box<T> is allowed in C FFI in arguments and return types. From C's perspective it's a non-NULL pointer to T, but on the Rust side it gets memory management like a native Rust type.
Option<Box<T>> is allowed too, and it's a nullable pointer. Similarly &mut T is supported.
Using C-compatible Rust types in FFI function declarations can remove a lot of boilerplate. Unfortunately, Vec and slices aren't one of them.
by namjh on 11/6/24, 11:37 AM
Anyone who needs to handle FFI in Rust, should read the FFI chapter in Rustonomicon: https://doc.rust-lang.org/nomicon/ffi.html
Unsafe Rust is indeed very hard to write correctly. Rustonomicon is a good start to learn unsafe Rust.
by jclulow on 11/6/24, 10:05 AM
> Let's ignore for now that this will surprise every C developer out there that have been doing if (NULL != ptr) free(ptr) for 50 years now.
If you've been doing C for five decades, it's a shame not to have noticed that it's totally fine to pass a NULL pointer to free().
FWIW, I don't otherwise agreed with the thesis. I've written probably ten FFI wrappers around C libraries now, and in every case I was able to store C pointers in some Rust struct or other, where I could free them in a Drop implementation.
I also think it's not actually that unusual for C allocators (other than the truly ancient malloc(3C) family) to require you to pass the allocation size back to the free routine. Often this size is static, so you just use sizeof on the thing you're freeing, but when it's not, you keep track of it yourself. This avoids the need for the allocator to do something like intersperse capacity hints before the pointers it returns.
by scotty79 on 11/6/24, 10:12 AM
Why would anyone expect that they can free in one language something that was allocated in another? The allocator might work completely differently and require completely different actions to free.
by aapoalas on 11/6/24, 11:52 AM
The article only shows Rust code calling the FFI API but suggests that C/C++ might also be used to call the API.
In these cases I can imagine the caller passing in a pointer/reference to uninitialised stack memory, which is also UB in the last version if the allocating code! A `&mut T` must always point to a valid `T` and must not point to uninitialised memory.
It seems to me like it'd be best to take a `&mut MaybeUninit<T>` parameter instead, and write through that. A further upside is that now if the caller _is_ Rust code, you can use MaybeUninit to reserve the stack space for the `OwningArrayC<T>` and then after the FFI call you can use `assume_init` to move an owned, initialised `OwningArrayC<T>` out of the `MaybeUninit` and get all of Rust's usual automatic `Drop` guarantees: This is the defer you wanted all along.
by John23832 on 11/6/24, 10:55 AM
Implement Drop on a custom guard type. Boom. There you go.
by jtrueb on 11/6/24, 12:41 PM
I would totally use a Swift-style concise defer. Writing the idiomatic wrapper struct and implementing Drop is obtuse and repetitive. Both wrapper structs and the scopeguard macro approach have to be wasting compilation time.
I have a lot of FFI code paths that end up easier to understand if you use the C-style instead of the Rusty approach.
Readability and comp time should be important even in FFI code paths
by kelnos on 11/8/24, 2:05 PM
This article makes no sense. I don't know why you'd ever think you can allocate a Rust object, pass it to C code, and assume it's safe for that C code to call free() on it. This would be true for any other pair of languages when crossing an FFI boundary.
A possible "defer" keyword has nothing to do with any of this, and will not help you.
Rust doesn't need "defer".
by marcodiego on 11/6/24, 12:48 PM
Never used it, but I follow wg14, ISO's workgroup that maintains the C programming language specification. From there, it looks like defer is one of the most important improvements since the go programming language and I've seen a good number of examples were it prevents mistakes, makes the code more readable and avoids use of goto.
I can only hope it 'infects' other languages.
by cross on 11/7/24, 7:37 PM
The first attempt appears to try and transfer ownership of the allocated memory from the Vec to C, so my first question is, why not allocate the returned memory using libc::malloc?
But I do recognize that the code in the post was a simplified example, and it's possible that the flexibility of `Vec` is actually used; perhaps elements are pushed into the `Vec` dynamically or something, and it would be inconvenient to simulate that with `libc::malloc` et al. But even then, in an environment that's not inherently memory starved, a viable approach might be to build up the data in a `Vec`, and then allocate a properly-sized region using `libc::malloc` and copy the data into it.
Another option might be to maintain something like a BTreeMap indexed by pointer on the Rust side, keeping track of the capacity there so it can be recovered on free.
by lalaithion on 11/6/24, 2:24 PM
Instead of passing around the capacity in a struct or as an extra out parameter until you call free, could you instead branch on capacity == 0 in get_foos and set *out_foos to a null pointer? Just because the vector struct never has a null pointer doesn’t mean you can’t use the null pointer in your own API.
by LegionMammal978 on 11/8/24, 2:04 AM
One popular implementation of "defer" in Rust is the scopeguard crate [0]. Behind the scenes, all it does is create an object that will run your function once it's dropped (usually when it falls out of scope). It can be more robust than manually sprinkling library-specific free()s everywhere for FFI, since the function will also be run during an unwinding panic, sort of like a "finally" block. I've certainly found it helpful for one-off uses of certain APIs that I can't be bothered to write a bunch of Drop boilerplate for.
[0] https://docs.rs/scopeguard/latest/scopeguard/
by dathinab on 11/6/24, 12:01 PM
> libc::free > Hmm...ok...Well that's a bit weird,
It _really_ isn't, it's actually exactly how C (or C++) works if you have library allocating something for you you also need to use that library to free it as especially in context of linked libraries you can't be sure how something was allocated, if it used some arena, if maybe some of you libraries use jemalloc and others do not etc. So it's IMHO a very basic 101 of using external C/C++ libraries fact fully unrelated to rust (through I guess it is common for this to be not thought well).
Also it's normal even if everything uses the same alloc because of the following point:
> So now I am confused, am I allowed to free() the Vec<T>'s pointer directly or not?
no and again not rust specific, free is always a flat freeing `Vec<T>'s` aren't guaranteed to be flat (depending on T), and even if, some languages have small vec optimizations (through rust I thing guarantees that it's not done with `Vec` even in the future for FFI compatibility reasons)
so the get to go solution for most FFI languages boundaries (not rust specific) is you create "C external" (here rust) types in their language, hand out pointer which sometimes are opaque (C doesn't know the layout) and then _hand them back for cleanup_ cleaning them up in their language.
i.e. you would have e.g. a `drop_vec_u8` extern C function which just does "create vec from ptr and drop it" (which should get compiled to just a free in case of e.g. `Vec<u8>` but will also properly work for `Vec<MyComplexType>`.
> Box::from_raw(foos);
:wut_emoji:??
in many languages memory objects are tagged and treating one type as another in context of allocations is always a hazard, this can even happen to you in C/C++ in some cases (e.g. certain arena allocators)
again this is more a missing knowledge in context of cross language C FFI in general then rust specific (maybe someone should write a "Generic Cross Language C FFI" knowledge web book, I mean while IMHO it is basic/foundational knowledge it is very often not thought well at all)
> OwningArrayC > defer!{ super::MYLIB_free_foos(&mut foos); }
the issue here isn't rust missing defer or goto or the borrow checker, but trying to write C in rust while OwningArrayC as used in the blog is a overlap of anti-patterns wanting to use but not use rust memory management at the same time in a inconsistent way
If you want to "free something except not if it has been moved" rust has a mechanic for it: `Drop`. I.e. the most fundamental parts of rust (memory) resource management.
If you want to attach drop behavior to an existing type there is a well known pattern called drop guard, i.e. a wrapper type impl Drop i.e. `struct Guard(OwnedArrayC); impl Drop for Guard {...} maybe also impl DerefMut for Guard`. (Or `Guard(Option<..>)` or `Guard(&mut ...)` etc. depending on needs, like e.g. wanting to be able to move it out conveniently).
In rust it is a huge anti pattern to have a guard for a resource and not needing to access the resource through the guard (through you will have to do it sometimes) as it often conflicts with borrow checker and for RAII like languages in general is more error prone. Which is also why `scopeguard` provides a guard which wrapps the data you need to cleanup. That is if you use `scopeguard::guard` and similar instead of `scopeguard::defer!` macro which is for convenience when the cleanup is on global state. I.e. you can use `guard(foos, |foos| super::MYLIB_free_foos(&mut foos))` instead of deferr and it would work just fin.
Through also there is a design issue with super::MYLIB_free_foos(&mut foos) itself. If you want `OwningArrayC` to actually (in rust terms) own the array then passing `&mut foos` is a problem as after the function returns you still have foos with a dangling pointer. So again it shows that there is a the way `OwningArrayC` is done is like trying to both use and not use rusts memory management mechanics at the same time (also who owns the allocation of OwningArrayC itself is not clear in this APIs).
I can give following recommendations (outside of using `guard`):
- if Vec doesn't get modified use `Box<[T]>` instead
- if vec is always accessed through rust consider passing a `Box<Vec<T>>` around instead and always converting to/from `Box<Vec<T>>`/`&Vec<T>`/`&mut Vec<T>`, Box/&/&mut have some defactor memory repr compatibilities with pointer so you can directly place them in a ffi boundary (I think it's guaranteed for Box/&/&mut T and de-facto for `Option<Box/&/&mut T>` (nullpointer == None)
- if that is performance wise non desirable and you can't pass something like `OnwingArrayC` by value either specify that the caller always should (stack) allocate the `OnwingArrayC` itself then only use `OnwingArrayC` at the boundary i.e. directly convert it to `Vec<T>` as needed (through this can easily be less clear about `Vec<T>`, and `&mut Vec<T>` and `&mut [T]` dereferenced)
- In general if `OwningArrayC` is just for passing parameter bundles with a convention of it always being stack allocated by the caller then you also really should only use it for the transfer of the parameters and not automatic resource management, i.e. you should directly convert it to `Vec` at the boundary (and maybe in some edge cases use scopeguard::guard, but then converting it to a Vec is likely faster/easier to do). Also specify exactly what you do with the callee owned pointer in `OwningArrayC` i.e. do we always treat it ass dangling even if there are errors, do we set it to empty vec /no capacity as part of conversion to Vec and it's only moved if that was done etc. Also write a `From<&mut OwningArrayC> for Vec` impl, I recommend setting cap+len to zero in it).
And yes FFI across languages is _always_ hard, good teaching material often missing and in Rust can be even harder as you have to comply with C soundness on one side and Rust soundness on the other (but I mean also true for Python,Java etc.). Through not necessary for any of the problems in the article IMHO. And even if we just speak about C to C FFI of programs build separately the amount of subtle potentially silent food guns is pretty high (like all the issues in this article and more + a bunch of other issues of ABI incompatibility risks and potentially also linker time optimization related risk).
by GrantMoyer on 11/6/24, 12:57 PM
This only addresses a small point in the article, but you can shrink capacity to size before passing a Vec to C, then assume capacity = size when you need to free the Vec.
by throwawa14223 on 11/7/24, 6:03 PM
Defer is awful compared to just implementing drop.
by xanathar on 11/6/24, 10:48 AM
1) Do not free memory allocated in one language (even more: in one library unless explicitly documented so) into another. Rust can use a custom allocator, and what it uses by default is an implementation detail that can change at a blink.
2) Do not use Vec<T> to pass arrays to C. Use boxed slices. Do not even try to allocate a Vec and free a Box... how can it even work?!
3) free(ptr) can be called even in ptr is NULL
4) ... I frankly stopped reading. I will never know if Rust actually needs Go's defer or not.
by zoezoezoezoe on 11/8/24, 8:22 PM
We have defer, it’s called impl Drop
by seanhunter on 11/8/24, 5:11 AM
If you come (as the author does to the conclusion)
```
   > Rust + FFI is nasty and has a lot of friction.
```
...then I would say you have been living a little bit of a charmed life. FFI in most languages has improved beyond all recognition in the last 15 years. When I look at FFI in rust I can't believe how ergonomic and clean it is compared to say PerlXS (which was the old way of doing FFI in perl).
by neonsunset on 11/6/24, 11:14 AM
No, it doesn’t. There are better ways to abstract away RAII, and it is a strictly bad pattern in Rust to not rely on Drop. The author needs to stop writing C and Go, both being arguably bad languages.