from Hacker News

Arc AGI 2025

by artninja1988 on 3/24/25, 8:26 PM with 2 comments

  • by aaronvg on 3/24/25, 8:28 PM

    It's kind of insane going from 76% to 3% on the new version of a benchmark. We clearly need more rapid progress on the creation of benchmarks.

    Then again, I wonder -- if a benchmark is way too hard from the beginning, would it make it much harder for people to test new solutions that actually have real-world impact, even if the new results on the hard benchmark only increased the score by 1%?