by jicea on 1/9/25, 7:40 PM with 35 comments
by OptionOfT on 1/12/25, 5:14 PM
When you write code you have the choice to do per process, per thread, or sequential.
The problem is that doing multiple tests in a shared space doesn't necessarily match the world in which this code is run.
Per process testing allows you to design a test that matches the usage of your codebase. Per thread already constrains you.
For example: we might elect to write a job as a process that runs on demand, and the library we use has a memory leak, but it can't be fixed in reasonable time. Since we write it as a process that gets restarted we manage to constrain the behavior.
Doing multiple tests in multiple threads might not work here as there is a shared space that is retained and isn't representative of real world usage.
Concurrency is a feature of your software that you need to code for. So if you have multiple things happening, then that should be part of your test harness.
The test harness forcing you to think of it isn't always a desirable trait.
That said, I have worked on a codebase where we discovered bugs because the tests were run in parallel, in a shared space.
by o11c on 1/13/25, 5:47 AM
* Use multiple processes, but multiple tests per process as well.
* Randomly split and order the tests on every run, to encourage catching flakiness. Print the seed for this as part of the test results for reproducibility.
* Tag your tests a lot (this is one place where, as many languages provide, "test classes" or other grouping is very useful). Smoke tests should run before all other tests, and all run in one process (though still in random order). Known long-running tests should be tagged to use a dedicated process and mostly start early (longest first), except that a few cores should be reserved to work through the fast tests so they can fail early.
* If you need to kill a timed-out test even though other tests are still running in the same process - just kill the process anyway, and automatically run the other tests again.
* Have the harness provide fixtures like "this is a temporary directory, you don't have to worry about clearing it on failure", so tests don't have to worry about cleaning up if killed. Actually, why not just randomly kill a few tests regardless?
I wrote some more about tests here [1], but I'm not sure I'll update it any more because of Github's shitty 2FA-but-only-the-inconvenience-not-the-security requirement.
by cortesi on 1/12/25, 9:22 PM
The one thing we've had to be aware of is that the execution model means there can sometimes be differences in behaviour between nextest and cargo test. Very occasionally there are tests that fail in cargo test but succeed in nextest due to better isolation. In practice this just means that we run cargo test in CI.
by marky1991 on 1/12/25, 4:08 PM
I'm not actually clear what he means by 'test' to be honest, but I assume he means 'a single test function that can either pass or fail'
Eg in python (nose)
class TestSomething: def test_A(): ... def test_B(): ...
I'm assuming he means test_A. But why not run all of TestSomething in a process?
Honestly, I think the idea of having tests have shared state is bad to begin with (for things that truly matter, eg if the outcome of your test depends on the state of sys.modules, something else is horribly wrong), so I would never make this tradeoff to benefit a scenario that I never think should be done.
Even if we were being absolute purists, this still hasn't solved the problem, the second your process communicates with any other process (or server). And that problem seems largely unsolveable, short of mocking.
Basically, I'm not convinced this is a good tradeoff, because the idea of creating thousands and thousands of processes to run a test suite, even on linux, sounds like a bad idea. (And at work, would definitely be a bad idea, for performance reasons)
by Ericson2314 on 1/12/25, 7:17 PM
That is especially good for bare metal. If you don't have global allocator, have limitted ram, etc., you might not be able to write the test harness as part of the guest program at all! so you want want to move as much logic to the host program as possible, and then run as little as a few instructions (!) in the guess program.
See https://github.com/gz/rust-x86 for an example of doing some of this.
by pjc50 on 1/12/25, 8:16 PM
by bfrog on 1/13/25, 1:54 AM
by sedatk on 1/12/25, 5:55 PM
Is "memory corruption" an issue with Rust? Also, if one test segfaults, isn't it a reason to halt the run because something got seriously broken?
by amelius on 1/12/25, 5:51 PM
by zbentley on 1/12/25, 10:17 PM
For tests specifically, some considerations I found to be missing:
- Given speed requirements for tests, and representativeness requirements, it's often beneficial to refrain from too much isolation so that multiple tests can use/excercise paths that use pre-primed in memory state (caches, open sockets, etc.). It's odd that the article calls out that global-ish state mutation as a specific benefit of process isolation, given that it's often substantially faster and more representative of real production environments to run tests in the presence of already-primed global state. Other commenters have pointed this out.
- I wish the article were clearer about threads as an alternative isolation mechanism for sequential tests versus threads as a means of parallelizing tests. If tests really do need to be run in parallel, processes are indeed the way to go in many cases, since thread-parallel tests often test a more stringent requirement than production would. Consider, for example, a global connection pool which is primed sequentially on webserver start, before the webserver begins (maybe parallel) request servicing. That setup code doesn't need to be thread-safe, so using threads to test it in parallel may surface concurrency issues that are not realistic.
- On the other hand, enough benefits are lost when running clean-slate test-per-process that it's sometimes more appropriate to have the test harness orchestrate a series of parallel executors and schedule multiple tests to each one. Many testing frameworks support this on other platforms; I'm not as sure about Rust--my testing needs tend to be very simple (and, shamefully, my coverage of fragile code lower than it should be), so take this with a grain of salt.
- Many testing scenarios want to abort testing on the first failure, in which case processes vs. threads is largely moot. If you run your tests with a thread or otherwise-backgrounded routine that can observe a timeout, it doesn't matter whether your test harness can reliably kill the test and keep going; aborting the entire test harness (including all processes/threads involved) is sufficient in those cases.
- Debugging tools are often friendlier to in-process test code. It's usually possible to get debuggers to understand process-based test harnesses, but this isn't usually set up by default. If you want to breakpoint/debug during testing, running your tests in-process and on the main thread (with a background thread aborting the harness or auto-starting a debugger on timeout) is generally the most debugger-friendly practice. This is true on most platforms, not just Rust.
- fork() is a middle ground here as well, which can be slow, though mitigations exist, but can also speed things up considerably by sharing e.g. primed in-memory caches and socket state to tests when they run. Given fork()'s sharp edges re: filehandle sharing, this, too, works best with sequential rather than parallel test execution. Depending on the libraries in use in code-under-test, though, this is often more trouble than it's worth. Dealing with a mixture of fork-aware and fork-unaware code is miserable; better to do as the article suggests if you find yourself in that situation. How to set up library/reusable code to hit the right balance between fork-awareness/fork-safety and environment-agnosticism is a big and complicated question with no easy answers (and also excludes the easy rejoinder of "fork is obsolete/bad/harmful; don't bother supporting it and don't use it, just read Baumann et. al!").
- In many ways, this article makes a good case for something it doesn't explicitly mention: a means of annotating/interrogating in-memory global state, like caches/lazy_static/connections, used by code under test. With such an annotation, it's relatively easy to let invocations of the test harness choose how they want to work: reuse a process for testing and re-set global state before each test, have the harness itself (rather than tests by side-effect) set up the global state, run each test with and/or without pre-primed global state and see if behavior differs, etc. Annotating such global state interactions isn't trivial, though, if third-party code is in the mix. A robust combination of annotations in first-party code and a clear place to manually observe/prime/reset-if-possible state that isn't annotated is a good harness feature to strive for. Even if you don't get 100% of the way there, incremental progress in this direction yields considerable rewards.
by grayhatter on 1/12/25, 4:19 PM