from Hacker News

Deep Learning, Deep Scandal

by imichael on 4/8/25, 3:45 AM with 22 comments

  • by nmca on 4/8/25, 4:13 AM

    The linked USAMO math results are in an exam that requires proofs. The same authors, on the same website, ran AIME 2025 shortly after it happened and found it was totally consistent with the o1 announcement numbers; the difference being that the AIME requires only short answers and no proof.

    If you are a skilled mathematician, it is quite easy to verify both that (as of 7th April) models excel at novel calculations on held out problems and mostly shit the bed when asked for proofs.

    Gary cites these USAMO as evidence of contamination influencing benchmark results, but that view is not consistent with strong performance of the models on clearly held out tasks (arc test, AIME 25, HMMT 25, etc etc).

    If you really care, you can test this by inventing problems! It is a very very verifiable claim about the world.

    In any case, this is not the pundit you want. There are many ways to make a bear case that are much saner than this.

  • by tptacek on 4/8/25, 4:08 AM

    Does any of this matter if you're a person that thinks "AGI" is a silly concept, and just uses these tools for what they're good at currently?

    I'm not trying to be snarky, I'm just wondering why I would care that a tech giant has failed to cross the "GPT-5" threshold. What's the significance of that to an ordinary user?

  • by clauderoux on 4/8/25, 7:01 AM

    As I said many times, I have been in the game for 30 years. I started doing AI with rules as early as the beginning of the 90 and I never...never expected to see anything like LLMs in my lifetime. When I read Marcus once again saying that: yes this time LLM have reached their limit, which he has been saying for 2 years in a row, I'm really feeling tired of his tune. The idea that LLM are a dead end, a failing technology is pretty weird. Compared to what??? I use LLM everyday in my work, to write summaries, to make translations, to generate some code or to get explanations about a given code... And I even use them as research sparring partners to see how I could improve my work... Gary Marcus has been involved in the domain for 30 years as well... Where is his technology that would match or surpass the LLM???
  • by mdonaj on 4/8/25, 4:21 AM

    One of the comments in the article says: "I don't see how it's not a net negative tech," to which Marcus replies: "That’s my current tentative conclusion, yes."

    What is the negative effect I'm not seeing? Bad code? Economic waste in datacenter investment? Wasted effort of researchers who could be solving other problems?

    I've been writing software for over a decade, and I’ve never been as productive as I am now. Jumping into a new codebase - even in unfamiliar areas like a React frontend - is so much easier. I’m routinely contributing to frontend projects, which I never did before.

    There is some discipline required to avoid the temptation to just push AI-generated code, but otherwise, it works like magic.

  • by jmweast on 4/8/25, 4:05 AM

    Really just a matter of when the bubble pops now, isn't it? There's just too much substantial evidence pointing to the fact that AI is simply not going to be the product the big players say it will be.
  • by fouc on 4/8/25, 4:23 AM

    GPT-1 to GPT-2: June 2018 to February 2019 = 8 months.

    GPT-2 to GPT-3: February 2019 to June 2020 = 16 months.

    GPT-3 to GPT-4: June 2020 to March 2023 = 33 months.

    Looks like time to get to the next level is doubling. So we can expect GPT-5 sometime June 2028.

    Feels like people are being premature about claiming AI winter or that it is somehow a scandal that we don't already have GPT-5.

    It's going to take time. We need some more patience in this industry.

  • by ninetyninenine on 4/8/25, 4:16 AM

    The technology is just a couple years old, and this article is derived from a couple months of evidence.

    We can't yet say what the future holds. The nay Sayers who were so confident that LLMs were stochastic parrots are now embarrassingly wrong. This article sounds like that. Whether we are actually at a dead end or not is unknown. Why are people talking with such utter conviction when nobody truly understands what's going internally with LLMs?

  • by coolThingsFirst on 4/8/25, 4:18 AM

    I know it, it was a scary period for programmers. The tide is turning. Meatsuits are back in the game.
  • by bigyabai on 4/8/25, 4:00 AM

    > The reality, reported or otherwise, is that large language models are no longer living up to expectations, and its purveyors appear to be making dodgy choices to keep that fact from becoming obvious.

    What else is new?