from
Hacker News
Top
New
LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21
by
zone411
on 2/10/25, 7:09 PM with 3 comments
by
jszymborski
on 2/10/25, 8:08 PM
Some very odd choices in that first plot. Lower is better, but also the x-axis is inverted such that higher scores go towards the left.