by reqo on 12/1/24, 4:54 PM with 101 comments
by rors on 12/1/24, 9:14 PM
It's interesting that this paper doesn't contradict the conclusions of the Apple LLM paper[0], where prompts were corrupted to force the LLM into making errors. I can also believe that LLMs can only make small deviations from existing example solutions in creation of these novel solutions.
I hate that we're using the term "reasoning" for this solution generation process. It's a term coined by LLM companies to evoke an almost emotional response on how we talk about this technology. However, it does appear that we are capable of instructing machines to follow a series of steps using natural language, with some degree of ambiguity. That in of itself is a huge stride forward.
by jpcom on 12/1/24, 7:19 PM
by ijk on 12/1/24, 7:12 PM
by ninetyninenine on 12/1/24, 9:03 PM
surprised this gets voted up given the surprising amount of users on HN who think LLMs can't reason at all and that the only way to characterize an LLM is through the lens of a next token predictor. Last time I was talking about LLM intelligence someone rudely told me to read up on how LLMs work and that we already know exactly how they work and they're just token predictors.
by btilly on 12/1/24, 9:47 PM
Google claims that their use of pretraining is a key requirement for being able to deliver a (slightly) better chip design. And they claim that a responding paper that did not attempt to do pretraining, should have been expected to be well below the state of the art in chip design.
Given how important reasoning is for chip design, and given how important pretraining is for driving reasoning in large language models, it is obvious that Google's reasoning is very reasonable. If Google barely beats the state of the art while using pretraining, an attempt that doesn't pretrain should be expected to be well below the current state of the art. And therefore that second attempt's poor performance says nothing about whether Google's results are plausible.
by andai on 12/2/24, 5:28 AM
> Conversely, at the other end of the spectrum, the model may draw from a broad range of documents that are more abstractly related to the question, with each document influencing many different questions similarly, but contributing a relatively small amount to the final output. We propose generalisable reasoning should look like the latter strategy.
Isn't it much more impressive if a model can generalize from a single example?
by semessier on 12/1/24, 7:35 PM
by largbae on 12/1/24, 6:38 PM
by ricardobeat on 12/1/24, 9:32 PM
by samirillian on 12/2/24, 5:11 PM
by sgt101 on 12/1/24, 6:52 PM
I mean - like for arithmetic?
by shermantanktop on 12/1/24, 8:28 PM
We hold AI to a pretty high standard of correctness, as we should, but humans are not that reliable on matters of fact, let alone on rigor of reasoning.
by ScottPowers on 12/2/24, 12:08 PM