by pja on 1/21/25, 8:26 AM with 5 comments
by Havoc on 1/21/25, 1:53 PM
They gained literally nothing and took another knock on rep. It’s not like it was a mystery that they’re working on reasoning models.
by th123128 on 1/21/25, 9:56 AM
It is their fault though to participate in writing a test set in the first place that will obviously be used in a non-scientific manner. "Trust-me-bro" science on other people's closed servers.
But AI ethics is a dangerous field that can get you suicided.
by aithrowawaycomm on 1/21/25, 12:01 PM
This is odd. The issue isn't o3's "capabilities" or AINotKillEveryoneism, it's the spreading corrosion of OpenAI's dishonest marketing. Presumably those contributors thought they were making a good benchmark. Instead they got misled into making an infomercial.
This specifically hurts Terence Tao, because it raises the question about whether or not he knew that OpenAI had privileged access. Epoch and OpenAI tarnished his reputation in order to improve o3's reputation. Truly despicable.
by rurban on 1/21/25, 10:30 AM
Because 2% they could have solved without cheating