from Hacker News

Ask HN: Measuring the long-term benefit of interview code tests?

by traviskuhl on 11/5/21, 3:08 PM with 71 comments

If your company does coding tests during the engineering interview process, (how) do you measure the long term effectiveness of the tests? Do you keep internal metrics comparing candidates' score to their long term impact/success at the company? If yes, what have you learned from the results and how have those learnings impacted your hiring process?
  • by kasey_junk on 11/5/21, 3:41 PM

    I don't know if my current company does, but when I first implemented them for a company I worked for ~15 years ago we definitely did.

    At that company (which was a ~200 engineer, privately held, software company) we found a few things: - in person tests were less predictive than take home tests. - tests that did not provide automated test cases as examples were less predictive than those that did. - there was virtually no predictive power to 'secret test cases' that we ran without providing to the candidate. - no other part of the interview pipeline was predictive at all. Not whiteboarding, not presenting, not personality interviews, not culture fit testing, not credentials, or where experience came from, nothing. That was across all interviewers and candidates.

    A few caveats about this: - this was before take home testing had become widespread and many companies screwed it up. At the time we were doing this it was seen as novel and interesting by candidates, not as just one more painful hoop they had to jump through. - we never interviewed enough candidates to get true statistical relevance. - false negatives were our biggest concern, they are extremely hard to measure (and potentially open yourself up to lawsuit). The best we ended up doing was opening up our pipeline to become less selective to account for it. This did not seem to reduce employee quality.

    In a more meta-sense, that experience led me to believe that strict hiring pipelines are largely not useful. Bad candidates still get through and good candidates don't. Also, many other things have a much bigger outsized impact on productivity than if a candidate was 'good'. It turns out, humans do not produce at consistent levels all the time and things outside of what you can interview for make more impact (company process, employee health, life events, etc. all have way more impact on employee productivity than their 'score' at interview time).

  • by lbriner on 11/5/21, 3:42 PM

    We don't use coding tests in this way. We use coding tests as a screening process to ensure the candidate is in the correct ballpark.

    If we are recruiting a senior, we would expect them to easily complete basic technical tests. If they are more junior we might use them only as an indicator of their ability.

    I don't particularly expect a strong correlation between how well they did in the tests and their long-term ability since their value is made up of many things, only one of which is their ability in the tests.

  • by cap10morgan on 11/5/21, 3:51 PM

    In my experience "Does your company measure the long term benefit of X?" is 99.99999% "no" for any X.
  • by psadri on 11/5/21, 5:45 PM

    I'd like to point out that success in a role depends on more factors than the technical interview.

    I have found that investing the time to correctly onboard new team members makes a huge difference. Correctly onboard an average/good hire and they go on to produce solid output and often thrive. On the other hand, you could have a great new hire but because of no/poor onboarding they "sink" instead of swim.

  • by vannevar on 11/5/21, 4:01 PM

    The company would have to be fairly large (>100 employees) and long-lived (>10 years) to generate an amount of data with any hope of statistical significance. Employee "success" depends on many factors, and an employee who can seem to be a failure in the short-term may end up becoming very successful (or vice-versa), simply because of external circumstances---the nature of the projects, the clients, colleagues, etc.
  • by boldslogan on 11/5/21, 3:24 PM

    and maybe a follow up question (to measure the false negatives)

    Do you check the applicants who were denied based on their test and see where they ended up working at. E.g. you are a mid tier start up who rejects someone who ends up working at amazon as a high level engineer – do you mark that a failure?

  • by daviddever23box on 11/5/21, 3:26 PM

    For developers, coding tests that include deployment / infrastructure components (i.e., deploy your solution to a cloud container, or, build and compile your solution for desktop platform testing) are uniformly consistent with long-term impact / success. Problem solving at the algorithmic layer may be inversely correlated to success, if a candidate lacks a production skill set.

    Unless one's focus is research and development, there is a non-zero cost to training for production skills, so it's best to start with someone who understands the delivery process.

    Linear metrics are probably less useful, inasmuch as it will become rather obvious as to which employees are self-starting and work well with others, versus those that require motivation or are staunch individualists.

  • by kqr on 11/5/21, 5:05 PM

    The more fundamental question: is your company meaningfully able to measure the long term impact/success of its employees? If so, how?

    The submitted question seems to just brush over this aspect, but so far when I've tried to evaluate interviewing techniques that has been the primary obstacle; people just can't agree on what success means once employed, so anything that tries to correlate interviewing to that will be an equal amount of junk.

  • by poulsbohemian on 11/5/21, 5:11 PM

    I think my favorite story of code tests was where one interviewer presented the test, gave me 24 hours to complete it, and I was then supposed to be "graded" by second team member. The second guy obviously didn't understand the requirements of the coding test (despite presumably receiving the same written instructions I received), so rejected me outright. Which I guess kinda gets to my thinking on coding tests, where you often learn a lot about companies by the crappy "tests" they think have merit.

    I interviewed hundreds of technical people in my career, across dev, test, and ops skill sets. I saw limited correlation between tests and aptitude. If you talk to someone about a project they've done, you know pretty quickly:

    1) Can they communicate technical ideas? 2) Can I develop a rapport with this person and work together? 3) Do they understand what they built? Can they talk about tradeoffs they made? Did they learn anything from the experience?

    A fizz buzz test isn't a terrible idea, but you also have to have an interviewer that understands how to administer it within the wider context of the interview. If the interviewer themselves doesn't understand it, they aren't qualified to actually administer it.

  • by dreen on 11/5/21, 4:45 PM

    There is no score or measurement. The task is to write a stopwatch in any technology you want and explain it along the way. Then we put in some bugs and ask for troubleshooting. It's all about the approach to the problem.
  • by andrew_ on 11/5/21, 3:57 PM

    I've never worked for a company that did (18 years in the industry this year). Of the 8 companies I've worked for, only one had interviewing figured out, and they didn't track or measure metrics on coding tests, challenges, etc. They did allow the challenges to evolve and they were tailored to the position that the tests/challenges were for.
  • by AnotherGoodName on 11/5/21, 4:22 PM

    The big FAANG do for what it's worth. They have entire ML pipelines looking at hiring. The following isn't about interview effectiveness but is one example of the analysis done:

    https://catonmat.net/programming-competitions-work-performan...

  • by xeromal on 11/5/21, 4:24 PM

    As long as we keep finding good people and are not understaffed, it's working for us. Not more metrics needed than that.
  • by ipnon on 11/5/21, 4:23 PM

    Not empirically, but my manager focuses primarily on the engineering expensive for our team and potential hires. This results in explicit feedback gathering, modifying our process accordingly.

    We have short, standardized, broad interviews. We look for what can be added to the team rather than poking holes, and we're still trying to improve.

  • by Aeolun on 11/5/21, 4:42 PM

    We don’t do coding tests at all. We do one 30-60m interview that covers some general tech questions and motivation.

    So far we’ve hired 7 decent and 3 great people. No truly bad people have made it through that pipeline yet.

    I can’t say anything about why, and I’d be prejudiced in any case.

  • by nonameiguess on 11/5/21, 4:12 PM

    The only way this could even conceivably be done in a scientifically valid way is randomized controlled trials, which would mean not giving the same interviews to all candidates, which is only possible if hiring at a large enough scale as to even be able to sample meaningfully from multiple "interview type" groups, and it would of course require it to be legal to give different interviews at random, which I'm not sure is true. I guess as long as it's actually random, you're not discriminating against any specific group, but it isn't exactly fair and you risk killing goodwill of your employees when they realize you're running experiments on them.

    Of course, it's really not possible at all to do this at the level of rigor expected of, say, clinical trials. Each new hire will know what type of interview you put them through, and there is no reliable way you can prevent them from telling others.

  • by a_c on 11/5/21, 4:15 PM

    I would say anything having indirect correlation has no easy way to measure. Ultimately a company is either looking for product market fit, customer growth or revenue/profit/cash flow depending which stage the company is in.

    On top of being hard to measure, the data points generated through hiring is just too few and the data collection process is too long and subjective

    Just ask your team if they like the new hire, can they make progress together. Things like do you like working with the new hire? Is the new hire bringing in new insights to the team? Is the new hire easy to work with? Is the new hire learning new things.

    And most importantly, can the team let go of mismatch fast enough. Overall I would say it is just not worth it in measuring hiring.

  • by nitwit005 on 11/5/21, 3:49 PM

    Of course not.

    However, we do hire some contractors essentially without an interview, and it is fairly apparent that's a bad idea.

  • by maxgfaraday on 11/6/21, 10:14 AM

    The main thing that matters is training managers properly. Training management to be clearer about how they communicate goals and how transparent they are. The fault is not with candidates. Making sure a candidate can communicate clearly and effectively and has some passion for the position is all that you can really do at the interview level. The rest is quite frankly having better management and a culture of being helpful. Metrics on your org should be about how clear are the processes and planning toward goals and how well do they get communicated and executed. I worked for a MAAAN company and this one didn’t get it right. I figured they just made the decision that it is better to crank through people than actually grow them - since they were never short of candidates. This was pretty clear from their promotion culture and assessments that rewarded selfishness. Bottom line... train managers. Build the scaffolding to grow competent, empathetic, managers. Communication and clarity and empathy wins over everything else. F** programming test hazing. Commit to the people in the organization. Done.