from Hacker News

Procedural knowledge in pretraining drives reasoning in large language models

by reqo on 12/1/24, 4:54 PM with 101 comments

by rors on 12/1/24, 9:14 PM
It seems obvious to me that LLMs wouldn't be able to find examples of every single problem posed to them in training data. There wouldn't be enough examples for the factual look up needed in an information retrieval style search. I can believe that they're doing some form of extrapolation to create novel solutions to posed problems.
It's interesting that this paper doesn't contradict the conclusions of the Apple LLM paper[0], where prompts were corrupted to force the LLM into making errors. I can also believe that LLMs can only make small deviations from existing example solutions in creation of these novel solutions.
I hate that we're using the term "reasoning" for this solution generation process. It's a term coined by LLM companies to evoke an almost emotional response on how we talk about this technology. However, it does appear that we are capable of instructing machines to follow a series of steps using natural language, with some degree of ambiguity. That in of itself is a huge stride forward.
[0] https://machinelearning.apple.com/research/gsm-symbolic
by jpcom on 12/1/24, 7:19 PM
You mean you need humans to step-by-step solve a problem so a neural net can mimic it? It sounds kinda obvious now that I write it out.
by ijk on 12/1/24, 7:12 PM
This would explain the unexpected benefits of training on code.
by ninetyninenine on 12/1/24, 9:03 PM
>On the one hand, LLMs demonstrate a general ability to solve problems. On the other hand, they show surprising reasoning gaps when compared to humans, casting doubt on the robustness of their generalisation strategies
surprised this gets voted up given the surprising amount of users on HN who think LLMs can't reason at all and that the only way to characterize an LLM is through the lens of a next token predictor. Last time I was talking about LLM intelligence someone rudely told me to read up on how LLMs work and that we already know exactly how they work and they're just token predictors.
by btilly on 12/1/24, 9:47 PM
This is highly relevant to the recent discussion at https://news.ycombinator.com/item?id=42285128.
Google claims that their use of pretraining is a key requirement for being able to deliver a (slightly) better chip design. And they claim that a responding paper that did not attempt to do pretraining, should have been expected to be well below the state of the art in chip design.
Given how important reasoning is for chip design, and given how important pretraining is for driving reasoning in large language models, it is obvious that Google's reasoning is very reasonable. If Google barely beats the state of the art while using pretraining, an attempt that doesn't pretrain should be expected to be well below the current state of the art. And therefore that second attempt's poor performance says nothing about whether Google's results are plausible.
by andai on 12/2/24, 5:28 AM
> In the extreme case, a language model answering reasoning questions may rely heavily on retrieval from parametric knowledge influenced by a limited set of documents within its pretraining data. In this scenario, specific documents containing the information to be retrieved (i.e. the reasoning traces) contribute significantly to the model’s output, while many other documents play a minimal role.
> Conversely, at the other end of the spectrum, the model may draw from a broad range of documents that are more abstractly related to the question, with each document influencing many different questions similarly, but contributing a relatively small amount to the final output. We propose generalisable reasoning should look like the latter strategy.
Isn't it much more impressive if a model can generalize from a single example?
by semessier on 12/1/24, 7:35 PM
that resonates - less facts and more reasoning training data. The most low hanging in terms of non synthetic data probably being mathematical proofs. With prolog and the like many alternate reasoning paths could be generated. It's hard to say if these many-path would help in llm training without access to the gigantic machines (it's so unfair) to try it on.
by largbae on 12/1/24, 6:38 PM
Is this conclusion similar to my layman's understanding of AlphaGo vs AlphaZero? That human procedural knowledge helps ML training to a point, and from there on becomes a limitation?
by ricardobeat on 12/1/24, 9:32 PM
Does this mean LLMs might do better if trained on large amounts of student notes, exams, book reviews and such? That would be incredibly interesting.
by samirillian on 12/2/24, 5:11 PM
Okay dumb question, why are the images they generate nightmarish nonsense. Why can’t they procedurally construct a diagram
by sgt101 on 12/1/24, 6:52 PM
drives retrieval of patterns of procedure?
I mean - like for arithmetic?
by shermantanktop on 12/1/24, 8:28 PM
Going meta a bit: comments so far on this post show diametrically opposing understandings of the paper, which demonstrates just how varied the interpretation of complex text can be.
We hold AI to a pretty high standard of correctness, as we should, but humans are not that reliable on matters of fact, let alone on rigor of reasoning.
by ScottPowers on 12/2/24, 12:08 PM
thanks