from Hacker News

GPT 4.5 level for 1% of the price

by decide1000 on 3/16/25, 10:23 AM with 257 comments

  • by GavCo on 3/16/25, 1:07 PM

    Surprised nobody has pointed this out yet — this is not a GPT 4.5 level model.

    The source for this claim is apparently a chart in the second tweet in the thread, which compares ERNIE-4.5 to GPT-4.5 across 15 benchmarks and shows that ERNIE-4.5 scores an average of 79.6 vs 79.14 for GPT-4.5.

    The problem is that the benchmarks they included in the average are cherry-picked.

    They included benchmarks on 6 Chinese language datasets (C-Eval, CMMLU, Chinese SimpleQA, CNMO2024, CMath, and CLUEWSC) along with many of the standard datasets that all of the labs report results for. On 4 of these Chinese benchmarks, ERNIE-4.5 outperforms GPT-4.5 by a big margin, which skews the whole average.

    This is not how results are normally reported and (together with the name) seems like a deliberate attempt to misrepresent how strong the model is.

    Bottom line, ERNIE-4.5 is substantially worse than GPT-4.5 on most of the difficult benchmarks, matches GPT-4.5 and other top models on saturated benchmarks, and is better only on (some) Chinese datasets.

  • by ksec on 3/16/25, 11:43 AM

    I guess this is the end of OpenAI? No more dreaming of Universal Basic Compute for AI, Multi Trillion for Fabs and Semi?

    This is just like everything in China. They will find ways to drive down cost to below anyone previously imagined, subsidised or not. And even just competing among themselves with DeepSeek vs ERNIE and Open sourcing them meant there is very little to no space for most.

    Both DRAM and NAND industry for Samsung / Micron may soon be gone, I thought this was going to happen sooner but it seems finally happening. GPU and CPU Designs are already in the pipelines with RISC-V, IMG and ARM-China. OLED is catching up, LCD is already taken over. Batteries we know. The only thing left is foundries.

    Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.

  • by patrickhogan1 on 3/16/25, 11:39 AM

    What's interesting about Baidu's AI model Ernie is that Baidu and its founder, Robin Li, have been working on AI for a long time. Robin Li has a strong background in AI research going back many years. Also notable is that some of the key early research on scaling laws—important for understanding how AI models improve as they get bigger—was done by Baidu's AI lab. This shows Baidu's significant role in the ongoing development of AI.

    https://research.baidu.com/Blog/index-view?id=89

    I am excited to see Baidu catchup. It feels like they have earned it. Being very early.

  • by jampekka on 3/16/25, 11:08 AM

    And open weights promised for June. China is really taking over in the ML game.

    https://x.com/Baidu_Inc/status/1890292032318652719

  • by pacifika on 3/16/25, 11:26 AM

    Is the title claim correct? It is not mentioned as such in the tweet.
  • by decide1000 on 3/16/25, 10:26 AM

    ERNIE 4.5: Input and output prices start as low as $0.55 per 1M tokens and $2.2 per 1M tokens, respectively.

    Comparison models: https://x.com/Baidu_Inc/status/1901094083508220035/photo/1

  • by simonw on 3/16/25, 11:26 AM

    Anyone managed to try this yet? https://yiyan.baidu.com/ appears to require a Chinese phone number.
  • by Logge on 3/16/25, 12:20 PM

    GTP 4.5 is not a reasoning model. Reasoning models outperform it clearly. Even OpenAIs o3-mini is smarter while being magnitudes cheaper. Those 2 should be compared in my opinion. GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models.
  • by colesantiago on 3/16/25, 12:21 PM

    Good.

    OpenAI, Anthropic, et al, are getting sucked into a vortex of competition with China that is ultimately going to zero.

    AI is the ultimate race to zero.

    There is no moat. AI and intelligence is becoming a commodity with nobody (except Nvidia) is making money. This is known for a while now.

    The acceleration and adoption would only make those in the middle who aren't aware of the change happening without a job and unable to get a job.

    The US-China competition in addition to Jevons Paradox will be so viciously fierce that jobs will be removed as soon as they are created.

  • by jamesblonde on 3/16/25, 1:21 PM

    Baidu have a long history in the scalable distributed deep learning space. PaddlePaddle (so good they named it twice) predates Ray and supports both data parallel and model-parallel training. It is still being developed.

    https://github.com/PaddlePaddle/Paddle

    They have pedigry.

  • by kleiba on 3/16/25, 12:02 PM

    US: Could I interest you in my lunch?

    China: Thanks, already on it.

  • by curl-up on 3/16/25, 12:46 PM

    Cheap means small, small means low Q&A scores. I know that this isn't that important for the majority of applications, but I feel that over-reliance on RAG whenever Q&A performance is discussed is quite misleading.

    Being able to clearly and correctly discuss science topics, to write about art, to understand nuances in (previously unseen) literature, etc. is impossible simply through powerful-reasoning + RAG, and so many advanced use cases would be enabled by this. Sonnet 3.5+ and GPT 4.5 are still unparalleled here, and it's not even close.

  • by pera on 3/16/25, 12:34 PM

  • by cubefox on 3/16/25, 1:39 PM

    The title is editorialized in a misleading manner.
  • by ohso4 on 3/16/25, 2:15 PM

    Lmarena.ai is a very accurate eval (with stylecontrol). Other benchmarks like AIME and whatever can be trained on/optimized for and therefore should not be trusted. Most ai companies do something fishy to boost their benchmark scores.
  • by gitfan86 on 3/16/25, 12:25 PM

    There is a interesting dynamic of supply and demand here. 1% is basically free for all existing use cases today.

    BUT new use cases are now realistic. The question is how long until demand for the new use cases shows up

  • by logicchains on 3/16/25, 11:34 AM

    Quite impressive if true because historically Baidu's models have tended to under-perform.
  • by unhappy_meaning on 3/16/25, 2:39 PM

    Man the AI race is just launching at all fronts.
  • by infrawhispers on 3/16/25, 11:31 AM

    NICE. This is the capitalism I signed up for…not OpenAI and Anthropic charging $200/mo for an LLM while trying to do regulatory capture.
  • by itsTyrion on 3/24/25, 4:40 PM

    Wake up honey, another company burned a few dozen gigawatthours on a shitty LLM
  • by hjgjhyuhy on 3/16/25, 11:28 AM

    [flagged]
  • by camillomiller on 3/16/25, 11:17 AM

    I hear the rumbling coming in Altmanland
  • by buyucu on 3/16/25, 12:35 PM

    I got flagged the last time I said this, but lets try again:

    OpenAI is increasingly irrelevant. They no longer push the boundaries of technology.

  • by folli on 3/16/25, 1:36 PM

    Hijacking this thread: what's currently the cheapest way to get structured data out of a PDF?

    I assume there's some reasonable tool out there to convert PDFs to Markup and than feed it to some LLM API with okay costs (Gemini? DeepSeek?). Any suggestions?