from Hacker News

Ask HN: What are your go-to "test" questions when evaluating a new LLM?

by johntiger1 on 12/22/23, 3:36 PM with 3 comments

Do you have a go-to question (or several) to check if an LLM knows its stuff? For me, I ask a simple question:

"What is Operation Konrad III"

which most LLMs fail due to the (relative) obscurity of the event.

  • by philippta on 12/23/23, 11:42 AM

    Not really scientific or anything but I tend to give it the task: „Write a simple http server in Go that saves all requests into a SQLite database.“

    What I am looking for is:

    - did it forget to import the SQLite driver?

    - is it doing weird SQL shenanigans like selecting MAX(id) to obtain the next potential id?

    - is the code rather simple or over-engineered?

    update: Most LLMs produce a decent answer, however it you increase the difficulty a little bit by asking it "Write a simple and CGo free http server in Go ...", most LLMs get the sql driver wrong (except for gpt-4-1106-preview)

  • by muzani on 12/24/23, 1:23 AM

    I give it a large block of code and see if it can find the bug. Amusingly, GPT sometimes passes it with flying colors (finding the bugs I didn't see and seeing unused imports) but at other times it just flat out fails to see anything.
  • by mejutoco on 12/23/23, 12:50 PM

    I ask it about creating a conversation in Polish with English translations about an encounter between two neighbours walking their dogs.