from Hacker News

LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21

by zone411 on 2/10/25, 7:09 PM with 3 comments

  • by jszymborski on 2/10/25, 8:08 PM

    Some very odd choices in that first plot. Lower is better, but also the x-axis is inverted such that higher scores go towards the left.