by glymor on 6/22/24, 1:57 AM with 38 comments
by derefr on 6/22/24, 4:36 AM
Why is it compelled to provide one, anyway?
Which is to say, why is the output of each model layer a raw softmax — thus discarding knowledge of the confidence each layer of the model had in its output?
Why not instead have the output of each layer be e.g. softmax but rescaled by min(max(pre-softmax vector), 1.0)? Such that layers that would output higher than 1.0 just get softmax'ed normally; but layers that would output all "low-confidence" results (a vector all lower than 1.0) preserve the low-confidence in the output — allowing later decoder layers to use that info to build I-refuse-to-answer-because-I-don't-know text?
by dawatchusay on 6/22/24, 3:21 AM
by glymor on 6/22/24, 3:40 AM
A figure from the paper shows this better than my TL;DR: https://www.nature.com/articles/s41586-024-07421-0/figures/1
by lokimedes on 6/22/24, 6:40 AM
We have focused on the inherent lack of input context, leading to wrong conclusions, but what about that 90B+ parameters universe, plenty of room for multiple contexts to associate any input to surprising pathways.
In the olden days of MLPs we had the same problem with softmax basically squeezing N output scores into a normalized “probability”, where each output neuron actually was the sum of multiple weighted paths, which one winning the softmax made up the “true” answer, but there may as well have been two equally likely outcomes, with just the internal “context” as difference. In physics we have the path integral interpretation and I dare say, we humans too, may provide outputs that are shaped by our inner context.
by zmmmmm on 6/22/24, 3:49 AM
This article seems rather contrived. They present this totally broken idea of how LLMs work (that they are trained from the outset for accuracy on facts) and then proceed to present this research as it is a discovery that LLMs don't work like that.
by ajuc on 6/22/24, 4:23 AM
If it's sure it won't confirm it both ways.
by gmerc on 6/22/24, 8:42 AM
by doe_eyes on 6/22/24, 3:11 AM
This assertion in the article doesn't seem right at all. When LLMs weren't trained for accuracy, we had "random story generators" like GPT-2 or GPT-3. The whole breakthrough with RLHF was that we started training them for accuracy - or the appearance of it, as rated by human reviewers.
This step both made the models a lot more useful and willing to stick to instructions, and also a lot better at... well, sounding authoritative when they shouldn't.
by techostritch on 6/22/24, 1:53 PM
Is it plausible that LLM’s get so smart that we can’t understand them. Do we spend like years trying to validate scientific theories confabulated by AI?
In the run up to super-intelligence, it seems like we’ll have to tweak the creativity knobs up, like the whole goal will be to find novel patterns humans don’t find, is there a way to tweak those knobs that get us super genius and not super conspiracy theorist? Is there even a difference? Part of this might depend on whether or not we think we can feed LLM’s “all” the information.
But in fact, assuming that Silicon Valley CEO’s are some of the smartest people in the world, I might argue that confabulation of a possible future is in fact their primary value. Not being allowed to confabulate is incredibly limiting.