by amenghra on 3/13/25, 8:46 PM with 32 comments
by lihaoyi on 3/14/25, 12:37 AM
- You usually have shared initialization nearer the root and the various cases you want to assert at the leaves.
- You want to group related tests logically together, so it's not one huge flat namespace which gets messy
- You want to run groups of tests at the same time, e.g. when testing a related feature
Typically, these different ways of grouping tests all end up with the same grouping, so it makes a lot of sense to have your tests form a tree rather than a flat list of @Test methods or whatever
Naturally you can always emulate this yourself. e.g. Having helper setup methods that call each other and form a hierarchy, or having a tagging discipline that forms a hierarchy to let you call tests that are related, or simply using files as the leaf-level of the larger filesystem tree to organize your tests. All that works, but it is nice to be able to simplify define a tree of tests in a single file and have all that taken care of for you
by simonw on 3/13/25, 10:48 PM
That's such a great condensation of why automated tests are worthwhile.
"To write your own testing framework based on continuation trees, all you need is a stack of databases (or rather, a database that supports rolling back to an arbitrary revision)."
PostgreSQL and SQLite and MySQL all support SAVEPOINT these days, which is a way to have a transaction nested inside a transaction. I could imagine building a testing system on top of this which could support the tree pattern described by Evan here (as long as your tests don't themselves need to test transaction-related behavior).
Since ChatGPT Code Interpreter works with o3-mini now I had that knock up a very quick proof of concept using Python and SQLite SAVEPOINT, which appears to work: https://chatgpt.com/share/67d36883-4294-8006-b464-4d6f937d99...
by turtleyacht on 3/13/25, 9:14 PM
One idea is to separate scraping from verification. The latter would run very fast and be reliable: it only tests against stored state.
Then scraping is just procedural, clicking things, waiting for page loads, and reading page elements into a database.
Some consequences are needing integrity checks to ensure data has been read (first name field selector was updated but not populated), self-healing selectors (AI, et al), and certifying test results against known versions (fixing the scraper amid UI redesign).
A lot of effort is saved by using screenshot diffing of, say, React components, especially edge cases. It also (hopefully) shifts-left test responsibility to the devs.
Ideally, we only have some e2e tests, mostly happy paths, that also act as integration tests.
We could combine these ideas with "stacked databases" from the article and save on duplication.
Finally, the real trick is knowing, in the face of changes, which tests don't have to run, making the whole run take less time.
by RossBencina on 3/14/25, 2:54 AM
by widdershins on 3/14/25, 7:41 AM
It turns out that using some evil macro magic, each test re-runs from the start for each inner section [1]. It also makes deduplicating setup code completely painless and natural.
You just have to get over the completely non-standard control flow. It's a good standard bearer for why metaprogramming is great, even if you're forced to do it in C/C++'s awful macro system.
[1] https://github.com/catchorg/Catch2/blob/devel/docs/tutorial....
by darioush on 3/14/25, 12:24 AM
If you create a query language, then the state can be verified to match expectations at any point.
I'm not sure why we don't program like this.
by zem on 3/13/25, 9:59 PM
by AdieuToLogic on 3/14/25, 1:51 AM
by mola on 3/14/25, 6:27 PM
But I can't find a nice way to have pytest make a test per node in the tree. We end up with a single test for the whole tree which is less than ideal for dev experience.
Anyone with pytest hacking skills and an idea?
by aszen on 3/14/25, 4:32 AM
But I can see how this approach allows for parallelism as well, I especially like the fact that you only get one failure in case one of the steps fail
by djha-skin on 3/13/25, 10:33 PM
by alkonaut on 3/13/25, 11:04 PM
The lesser evil is to just ”do what you need and test everything once you are arranged”.
You won’t get hundreds of neatly separated well-named test cases which fail for a single reason. But for slow tests that isn’t as important as keeping the redundant setup away.
I like the tree idea but once we have simple pure/immutable we don’t really have the problem of redundant setup being slow, just ugly.