from Hacker News

Functional Tests as a Tree of Continuations (2010)

by amenghra on 3/13/25, 8:46 PM with 32 comments

  • by lihaoyi on 3/14/25, 12:37 AM

    This is the approach my uTest testing library (https://github.com/com-lihaoyi/utest) takes. I don't think it's unique to functional tests, even unit tests tend towards this pattern. Tests naturally form a tree structure, for multiple reasons:

    - You usually have shared initialization nearer the root and the various cases you want to assert at the leaves.

    - You want to group related tests logically together, so it's not one huge flat namespace which gets messy

    - You want to run groups of tests at the same time, e.g. when testing a related feature

    Typically, these different ways of grouping tests all end up with the same grouping, so it makes a lot of sense to have your tests form a tree rather than a flat list of @Test methods or whatever

    Naturally you can always emulate this yourself. e.g. Having helper setup methods that call each other and form a hierarchy, or having a tagging discipline that forms a hierarchy to let you call tests that are related, or simply using files as the leaf-level of the larger filesystem tree to organize your tests. All that works, but it is nice to be able to simplify define a tree of tests in a single file and have all that taken care of for you

  • by simonw on 3/13/25, 10:48 PM

    "One of the most essential practices for maintaining the long-term quality of computer code is to write automated tests that ensure the program continues to act as expected, even when other people (including your future self) muck with it."

    That's such a great condensation of why automated tests are worthwhile.

    "To write your own testing framework based on continuation trees, all you need is a stack of databases (or rather, a database that supports rolling back to an arbitrary revision)."

    PostgreSQL and SQLite and MySQL all support SAVEPOINT these days, which is a way to have a transaction nested inside a transaction. I could imagine building a testing system on top of this which could support the tree pattern described by Evan here (as long as your tests don't themselves need to test transaction-related behavior).

    Since ChatGPT Code Interpreter works with o3-mini now I had that knock up a very quick proof of concept using Python and SQLite SAVEPOINT, which appears to work: https://chatgpt.com/share/67d36883-4294-8006-b464-4d6f937d99...

  • by turtleyacht on 3/13/25, 9:14 PM

    End-to-end (e2e) tests are slow and flaky. They don't have to be, but effort to fix breakage starts consuming most available time.

    One idea is to separate scraping from verification. The latter would run very fast and be reliable: it only tests against stored state.

    Then scraping is just procedural, clicking things, waiting for page loads, and reading page elements into a database.

    Some consequences are needing integrity checks to ensure data has been read (first name field selector was updated but not populated), self-healing selectors (AI, et al), and certifying test results against known versions (fixing the scraper amid UI redesign).

    A lot of effort is saved by using screenshot diffing of, say, React components, especially edge cases. It also (hopefully) shifts-left test responsibility to the devs.

    Ideally, we only have some e2e tests, mostly happy paths, that also act as integration tests.

    We could combine these ideas with "stacked databases" from the article and save on duplication.

    Finally, the real trick is knowing, in the face of changes, which tests don't have to run, making the whole run take less time.

  • by RossBencina on 3/14/25, 2:54 AM

    Even without the continuation piece, it has always puzzled me why the test frameworks that I've used (mostly pytest and catch) don't explicitly model dependencies between layers. Especially in a system where layers have been carefully levelized. Assuming for the sake of example that there are no mocks involved, if subsystem B depends on subsystem A (say A is some global utility classes), then I would want all of A's unit tests to pass before running any of subsystem B's tests. Not sure why this is absent, or perhaps I'm using the wrong test systems.
  • by widdershins on 3/14/25, 7:41 AM

    The C++ testing framework Catch2 enables this kind of testing. The first time I saw it I couldn't figure out how some of the tests would even pass.

    It turns out that using some evil macro magic, each test re-runs from the start for each inner section [1]. It also makes deduplicating setup code completely painless and natural.

    You just have to get over the completely non-standard control flow. It's a good standard bearer for why metaprogramming is great, even if you're forced to do it in C/C++'s awful macro system.

    [1] https://github.com/catchorg/Catch2/blob/devel/docs/tutorial....

  • by darioush on 3/14/25, 12:24 AM

    If you specify the operations (API) of your system in a relational algebra, then you can use that algebra to generate valid state transitions. (this essentially can construct the tree of continuations the article is discussing or enumerate the paths of this tree)

    If you create a query language, then the state can be verified to match expectations at any point.

    I'm not sure why we don't program like this.

  • by zem on 3/13/25, 9:59 PM

    on one hand I suspect too much code has explicit and implicit global state for this technique to be useful; on the other hand using this from the beginning might prevent introducing that sort of global state in the first place.
  • by AdieuToLogic on 3/14/25, 1:51 AM

    FWIW, languages which support Kleisli[0] types can achieve a similar benefit by defining functional tests composed of same. Many times, "lower level" Kleisli constructs can be shared across related functional tests to reduce duplication.

    0 - https://en.wikipedia.org/wiki/Kleisli_category

  • by mola on 3/14/25, 6:27 PM

    That's exactly how we e2e tests for our ai conversational agents All state is immutable, the agent is basically a reducer so having these continuation trees is easy.

    But I can't find a nice way to have pytest make a test per node in the tree. We end up with a single test for the whole tree which is less than ideal for dev experience.

    Anyone with pytest hacking skills and an idea?

  • by aszen on 3/14/25, 4:32 AM

    Traditional advice is to keep tests independent of each other, that's why the setup part gets repeated instead of being inherited from the parent tests. Independent tests can be run in parallel, dependent tests cannot be.

    But I can see how this approach allows for parallelism as well, I especially like the fact that you only get one failure in case one of the steps fail

  • by djha-skin on 3/13/25, 10:33 PM

    I wouldn't call 5-level-nested code "surprisingly clean", and continuations are cursed. I wouldn't want to have to debug tests that relied on continuations unnecessarily.
  • by alkonaut on 3/13/25, 11:04 PM

    Whenever I see a test suite do 9 steps of setup to assert one thing and then (mostly) the same 9 steps again to assert some other thing, I die a little inside. Especially when the setup takes multiple seconds for each case.

    The lesser evil is to just ”do what you need and test everything once you are arranged”.

    You won’t get hundreds of neatly separated well-named test cases which fail for a single reason. But for slow tests that isn’t as important as keeping the redundant setup away.

    I like the tree idea but once we have simple pure/immutable we don’t really have the problem of redundant setup being slow, just ugly.