from Hacker News

Show HN: CLI for testing and evaluating LLM prompts and outputs

by typpo on 7/19/23, 5:27 PM with 0 comments

Hi HN,

This project has grown a lot recently and figure it's worth another submission. I use this tool for several LLM-based use cases that have over 100k DAU. It works pretty simply:

1) Create a list of test cases

2) Set up assertions for metrics/guardrails you care about, such as outputting only JSON or not saying "As an AI language model"

3) Run tests as you make changes. Integrate with CI if desired.

This makes LLM model and prompt selection easier because it reduces the process to something we're all familiar with: developing against test cases. You can iterate with confidence and avoid regressions.

There are a bunch of startups popping up in this space, but I think it's important to have something that is local (private), on the command line (easy to use in the development loop), and open-source.