by WolfOliver on 2/15/25, 6:08 PM with 84 comments
by int_19h on 2/15/25, 10:50 PM
Always write functional tests first. Doesn't matter if they are slow - you still want something that faithfully captures the specified behavior and allows you to detect regressions automatically.
Then, if your resulting test suite is too slow, add finer-grained tests in areas where the perf benefits of doing so dwarf the cost of necessary black-boxing.
Getting down to the level of individual classes, never mind functions - i.e. the traditional "unit tests" - should be fairly rare in non-library code.
by recursivedoubts on 2/15/25, 9:29 PM
You should be striving to balance the long-term usefulness of your tests with the debuggability of those tests. In my experience, those tests are what most people would call "integration tests" (although that name, like so much terminology in the testing world, is confusing and poorly defined.)
You want to get the tests up at as high a level of abstraction as possible where the API and correctness assertions are likely to survive implementation detail changes (unlike many unit tests) while at the same time avoiding the opaque and difficult to debug errors that come with end-to-end testing (again, the language here is confusing, I assume you know what I mean.)
by atum47 on 2/16/25, 12:18 AM
by codr7 on 2/15/25, 11:39 PM
Most of the systems I build use a database on which all logic depends, and often a network connection.
I've worked on systems where these aspects were mocked, and they eventually grind to a halt because of the effort required to make the tiniest change.
First of all you need a way to create a pristine database from code, preferably in memory. Second nested nested transactions are nice, since you can simply rollback the outer transaction per test case; otherwise you need to drop/create the database which is slower.
For networked servers, an easy way to start/stop servers in code and send requests to them is all you need.
Given these pieces, it's easy to write integration tests that run fast enough and give a lot of bang for the buck.
TDD is even more rare for me, I typically only do that when designing API's I'm unsure about, which makes imagining user code difficult. And fixing bugs, because it makes total sense to have a failing test to verify that you fixed it, and that it remains fixed.
by simonw on 2/16/25, 12:56 AM
I dislike that term because the most valuable tests I write are inevitably more in the shape of integration tests - tests that exercise just one function/class are probably less than 10% of the tests that I write.
So I call my tests "tests", but I get frustrated that this could be confused with manual tests, so then I call them "automated tests" but that's a bit of a mouthful and not a term many other people use.
I'd love to go back to calling them "unit tests", but I worry that most people who hear me say that will still think I'm talking about the test-a-single-unit-of-code version.
by Supermancho on 2/16/25, 2:19 AM
That's not the only argument. The important result of this, is ensuring the "unit" of code is written to be testable. This happens to require it be simple and extensible. It does not enforce making the code or tests comprehensible.
When you don't trust someone's code, have them write detailed unit tests. They will find most of their problems on their own and learn better practices, along the way.
I am, in no way, implying that unit tests are a replacement for integration or behavioral or E2E testing et al...depending on how you want to define those.
by brumar on 2/15/25, 9:25 PM
> Only isolate your code from truly external services
That makes tests more trustworthy but also sometimes harder to maintain I think. I have seen cases where small changes on the code base created strong ripple effects with many tests to update. Arguably, the tests were not very well written or organized and with too many high level tests. Still, this and the very large execution time of the test collection made me realized that for medium to large projects, I will be much more careful in the future before going all in with the no-mock approach.
by deterministic on 2/19/25, 9:45 AM
I auto test the API of the server/system/library/module I am responsible for. Nothing else. No auto testing of internal details.
It lets me completely rewrite internals without breaking the tests.
The API tests needs to be so good that another developer could implement the same server/system/library/module using the tests only.
And the API tests needs to try as hard as possible to break the code being tested.
Using this method I have had zero bugs in production for the last 5+ years.
by motorest on 2/16/25, 12:21 AM
> The argument for isolating the units from each other is that it is easier to spot a potential bug. (...) In my opinion, this does not pay out because of the huge amount of false positive test cases you get and the time you need to fix them. Also, if you know the code base a little you should have an idea where the problem is. If not, this is your chance to get to know the code base a little better.
This is at best specious reasoning, and to me reflects that the blogger completely misses the point of having tests.
To start off, there is no such thing as a false positive test. Your tests track invariants, specially those which other components depend on. The whole point of having these tests is to have a way to automatically check for them each and every single time we touch the code, so that the tests warn us that a change we are doing will cause the application to fail.
If you somehow decide to change your code so that a few invariants break, these are not "false positives". This is your tests working as expected and warning you that you must pay attention to what you are doing so that you do to not introduce regressions.
It's also completely mind-boggling and absurd to argue that "knowing the code" is any argument to avoid tracking invariants. The whole point of automated test suites is that you do not want the app to fail because you missed any detail or corner case or failure mode. Knowing the code does not prevent bugs or errors or regressions.
I'm perplexed by the way we have people write long articles on unit tests when they don't really seem to understand what they are supposed to achieve.
by arialdomartini on 2/15/25, 11:32 PM
I believe links are significantly more useful when they include descriptive text like the title or author, rather than just 'here'.
by nobleach on 2/15/25, 9:38 PM
The worst part about it is that he called himself a thought leader, called his approach a "best practice" and had nothing really to back that up. Now people go around repeating it all the time. It's frustrating.
by hansvm on 2/16/25, 12:19 AM
The author has a lot of opinions about testing though which conflict with what I've found to work in even that sort of dynamic environment. Their rationale makes sense on the surface (e.g., I've never seen a "mock"-heavy [0] codebase reap positive net value from its tests), but the prescription for those observed problems seems sub-optimal.
I'll pick on one of those complaints to start with, IMO the most egregious:
> Now, you change a little thing in your code base, and the only thing the testing suite tells you is that you will be busy the rest of the day rewriting false positive test cases.
If changing one little thing results in a day of rewriting tests, then either (a) the repo is structured such that small functional changes affect lots of code (which is bad, but it's correct that you'd therefore have to inspect all the tests/code to see if it actually works correctly afterward), or (b) the tests add coupling that doesn't exist otherwise in the code itself.
I'll ignore (a), since I think we can all agree that's bad (or at least orthogonal to testing concerns). For (b) though, that's definitely a consequence of "mock"-heavy frameworks.
Why?
The author's proposal is to just test observable behavior of the system. That's an easy way to isolate yourself from implementation details. I don't disagree with it, and I think the industry (as I've seen it) discounts a robust integration test suite.
What is it about "unit" tests that causes problems though? It's that the things you're testing aren't very well thought through or very well abstracted in the middle layers. Hear me out. TFA argues for integration tests at a high level, but if you (e.g.) actually had to implement a custom sorting function at your job would you leave it untested? Absolutely not. It'd be crammed to the gills with empty sets, brute-force checking every permutation of length <20, a smattering of large inputs, something involving MaxInt, random fuzzing against known-working sorting algorithms, and who knows what else the kids are cooking up these days.
Moreover, almost no conceivable change to the program would invalidate those tests incorrectly. The point of a sorting algorithm is to sort, and it should have some performance characteristics (the reason you choose one sort over another). Your tests capture that behavior. As your program changes, you either say you don't need that sort any more (in which case you just delete the tests, which is O(other_code_deleted)), or you might need a new performance profile. In that latter case, the only tests that are broken are associated with that one sorting function, and they're broken _because_ the requirements actually changed. You still satisfy O(test_changes) <= O(code_changes); the thing the author is arguing doesn't happen because of mocks.
Let's go back to the heavily mocked monstrosities TFA references. The problem isn't "unit" testing. Integration tests (the top of a DAG), and unit tests (like our sorting example, the bottom of a DAG) are easy. It's the code in between that gets complicated, and there might be a lot of it.
What do we do then?
At a minimum, I'd personally consider testing the top and bottom of your DAG of code. Even without any thought leadership or whatever garbage we're currently selling, it's easy to argue that tests at those levels are both O(other_code_written) in cost and also very valuable. At a high level (TFA's recommendation), the tests are much cheaper than the composite product, and you'd be silly not to include them. At a low level (truly independent units, like the "sorting" case study), you'd also be silly not to include them, since your developers are already writing those tests to check if it works as they implement the feature in the first place, and the maintenance cost of the tests is both proportional to the maintenance cost of the code being tested and extremely valuable in detecting defects in that code (recall that bugs are exponentially more expensive to fix the further down the pipeline the propogate before being triaged).
Addressing the bottom of your DAG is something the article, in some sense, explicitly argues against. They're arguing against the inverted pyramid model you've seen for testing. That seems short-sighted. Your developers are already paying approximately the cost of writing a good test when they personally test a sorting function they're writing, and that test is likely to be long-lived and useful; why throw that away? More importantly, building on shaky foundations is much more expensive than most people give it credit for. If your IDE auto-complete suggests a function name that says it does the right thing and accepts the arguments you're giving it, you get an immediate 10x in productivity if that autocomplete is always right. Wizards in a particular codebase (I've been that wizard in a few, my current role as well; that isn't a derogatory assessment of "other" people) can always internalize the whole thing and immediately know the right patterns, but for everyone else with <2yrs of experience in your company in particular (keep in mind that average silicon valley attrition is 2-3yrs), a function doing what it says it's going to do is a godsend to productivity.
Back to the problem at hand though. TFA says to integration test, and so do I. I also say to test your "leaf" code in your code DAG, since it's about the same cost and benefit. What about the shit in between?
In a lot of codebases I've seen, I'd say to chock it up as a lost cause and test both the integration stuff (that TFA suggest) and also any low-level details (the extra thing I'm saying is important). Early in my career, I was implementing some CRUD feature or another and explicitly coached (on finding that the reason implementation was hard was a broken function deep in the call-stack) to do the one-liner fix to make my use case work instead of the ten-liner to make the function actually correct and the 1000-liner to then correct every caller. I don't think they were wrong in giving that advice. I'm sad that the code was in a state where that was reasonable advice.
If you're working on newer projects though (or plan to be at a place for awhile and have the liberty to do some cleanup with every new feature (a pattern I wholly endorse and which has served me very well personally)), it's worth looking at that middling code and figuring out why it's so hard to work with. 99% of the time, the reason mocks look attractive isn't because they're the only solution. It's because they're the only solution that makes sense once you've already tied your hands. You don't need something to "unit" test the shutdown handler; you need something to test the total function which processes inputs and outputs and is called by the shutdown handler. You don't need to "unit" test a UI page that requires 3 different databases to produce any output; you need to unit test the functions which turn that output into that UI page (ideally, without mocks, since although those ostensibly do the same thing they usually add an extra layer of complexity and somehow break all your tests), and for something that messy you might even just need an "integration" test around that UI page asserting that it renders approximately correctly.
What else? People sell all kinds of solutions. "Functional Programming" or "OOP" or whatever. Programming is imperative when you execute it, and the right representation for the human reader varies from problem to problem. I don't have any classes to sell or methodologies to recommend. I do strongly recommend taking a very close look at the abstractions you've chosen though. I've had no problem deleting 90% of them at new jobs, making the code faster, more correct, and easier to modify (I usually do so as part of a "coup," fixing things slowly with each new feature). When every new feature deletes code, the benefits tend to snowball. I see my colleagues doing that now to code I recently wrote, and I'd personally do it again.
[0] People typically mean one of two things when they say they're "mocking" a dependency. The first is that they want a function to be "total" and have reasonable outputs for all possible inputs. They'll mock out many different interface implementations (or equivalent blah blah blah in your favorite language) to probe that behavior and ensure that your exponential backoff routine behaves reasonably when the clock runs backward, when 1000 of them are executed simultaneously, and whatnot. That tends to make for expensive tests, so I tend to see it reserved for risky code in teams which are risk-averse, but it's otherwise very good at its job. The other case is using some sort of "mock" library which lets you treat hard dependencies as soft dependencies and modify class instantiation, method return values, and all sorts of things to fit the test you're trying to write. This latter case is much more common, so it's what I'm referring to in a "heavily mocked" codebase. It's a powerful tool which could be used for good, but IME it's always overused enough that it would be better if it didn't exist.