from Hacker News

“Expect tests” make test-writing feel like a REPL session

by jsomers on 1/11/23, 11:40 PM with 92 comments

  • by mabbo on 1/14/23, 11:30 AM

    > But think: everything in those describe blocks had to be written by hand.

    It also had to be thought about by the developer. Someone had to say "I want the code to do this under these conditions".

    If your tests can be autogenerated then they aren't verifying expected behaviour, they're just locking in your implementation such that it can't change later. They are saying "hey look everyone, I got my coverage metric to 100% (despite any bugs I may have)."

  • by avgcorrection on 1/14/23, 10:02 AM

    > I think you’re supposed to write some nonsense, like assert fibonacci(15) == 8, then when the test says “WRONG! Expected 8, got 610”, you’re supposed to copy and paste the 610 from your terminal buffer into your editor.

    > This is insane!

    The sane approach is presumably to either expand the call tree and verify all the unique subsolutions. Or to do every step with a calculator if you can’t expand the call tree.

    > The %expect block starts out blank precisely because you don’t know what to expect. You let the computer figure it out for you. In our setup, you don’t just get a build failure telling you that you want 610 instead of a blank string. You get a diff showing you the exact change you’d need to make to your file to make this test pass; and with a keybinding you can “accept” that diff. The Emacs buffer you’re in will literally be overwritten in place with the new contents [1]:

    Oh okay. The non-insane approach is to do the first thing but Emacs copies the result on your behalf.

  • by CGamesPlay on 1/14/23, 9:08 AM

    Snapshot testing is great, and I wish more test frameworks included first-class support for them. This means that they can auto update with a flag, and can be stored either in the source inline or in an external file (both modes have different use cases). Note that doc tests can also be a form of this, e.g. in Python's.

    "Expect tests" seems like a bad name, since that covers all tests.

  • by ElliotH on 1/14/23, 9:25 AM

    I wonder if this has the same downsides as golden and screenshot type tests, where you end up over-asserting resulting in tests that break for unrelated changes?

    Obviously that’s a risk for hand written tests too but it’s easier (today… who knows what copilot like systems will offer soon!) for a human to reason about what’s relevant.

  • by scotty79 on 1/14/23, 10:24 AM

    Doesn't this approach make you update results of failing tests wholesale and possibly miss where a new result of some test is actually wrong?

    https://docs.rs/expect-test/latest/expect_test/

  • by wmanley on 1/14/23, 12:07 PM

  • by vdm on 1/14/23, 9:30 AM

    A similar approach with pytest and pdb https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests...

    This does get me writing tests sooner.

  • by BoppreH on 1/14/23, 8:19 PM

    Some years ago I wrote a Python function, "replace_me"[1], that edits the caller's source code. You can use it for code generation, inserting comments, generating fixed random seeds, etc.

    And one more use case I found was exactly what TFA describes, but even easier:

       import replace_me
       replace_me.test(1+1)
    
    Once executed, it evaluates the argument and becomes an assertion:

       import replace_me
       replace_me.test(1+1, 2)
    
    I never actually used it for anything important, but it comes back to my mind once in a while.

    [1]: https://github.com/boppreh/replace_me

  • by theptip on 1/14/23, 5:18 PM

    I tend to think that tests should be carefully crafted for readability just like normal code. The “content of a REPL” is unlikely to be well-thought out enough to preserve meaningful invariants while remaining supple in the direction of likely changes. Perhaps in the hands of very good engineers this tool is net positive, but I shudder at giving junior engineers a tool that encourages less structure in tests.

    A good set of fixture/helper functions should let you write really short and expressive tests (or tabular parametrized tests, if you prefer) which seems to me to resolve most of the pain points the author is complaining about.

    One big advantage I do see with this approach is it seems to be a very compact rendering of a table of outputs; in Python+pytest+PyCharm if I run a 10-example parametrized test, I have to click through to see each failure individually. Perhaps there is a UX learning here that just rendering the raw errors into the code beside the test matrix could help visualize results faster.

    As an aside, I have recently been enjoying the “write an ascii representation as your test assert” mode of testing, it can give a different way of intuiting what is going on.

  • by gleb on 1/14/23, 4:46 PM

    Similar idea in Elixir, where the library itself handles the interactive bits: https://github.com/assert-value/assert_value_elixir
  • by adrianmonk on 1/14/23, 5:20 PM

    I think this would suffer from the same problem as partial self-driving cars: it's human nature for vigilance to falter if it doesn't feel like you're the sole/primary one in control.

    Of course, you can say "I won't let myself do that", but working against human nature is not a formula for success. If my back hurts, I can tell myself I'm just going to go lie down on the bed for 10 minutes but not take a nap, but then 30 minutes later I wake up feeling groggy.

  • by evrimoztamur on 1/14/23, 10:10 AM

    Here's an older post from 2015 (also from Jane Street) explaining the same process https://blog.janestreet.com/testing-with-expectations/, but at the infancy of the method. It looks like they heavily polished it!

    I like the approach, and I was indeed copy-pasting the result from my console...

  • by bmitc on 1/14/23, 3:45 PM

    I don’t really understand this. How is this different from just writing the code and just assuming that you got it correct, and then locking in a potentially wrong implementation?

    > What does fibonacci(15) equal? If you already know, terrific—but what are you meant to do if you don’t?

    > I think you’re supposed to write some nonsense, like assert fibonacci(15) == 8, then when the test says “WRONG! Expected 8, got 610”, you’re supposed to copy and paste the 610 from your terminal buffer into your editor.

    Who does that? How do you know 610 is correct? That’s just assuming your implementation is right from the get go. For such a function, I’d independently calculate it, using some method I trust (maybe Wolfram Alpha). I’d do this for a handful of examples, trying to cover base and extreme cases. And then I’d do property testing if I really wanted good coverage. Further, this expect test library seems to just smoothen the experience of copying what the function returns into a test.

    This whole “expect test” business seems to rely on the developer looking at what the function returns for a given input, evaluating if it’s correct or not and then locking that in as “this is what this function is supposed to do”. That seems backwards and no different from how one implements functions in the first place, so I don’t know what is actually being tested.

    The entire point of testing is saying “this is what this function should do” and not “this is what the function did and thus that’s what it should always do”.

  • by CJefferson on 1/14/23, 5:35 PM

    I work with a language where all test are expect tests ( GAP ). The biggest problem is you can basically never change how built in types are printed, as you'll break all tests in every program. For example, someone wanted to improve how plurals are printed, but that would break every test.
  • by arcturus17 on 1/14/23, 5:09 PM

    Is there anything like this in Python or C#? I have worked with OCaml extensively in coursework, but there’s no chance I’ll be using it in prod any time soon and I’d love toying with this approach in my working languages.