by tompark on 10/8/23, 12:46 AM with 62 comments
by kalkin on 10/8/23, 2:19 AM
by moralestapia on 10/8/23, 3:20 AM
Well, sort of ..., I'm refining an algo that takes several (carefully calibrated) outputs from a given LLM and infers the most plausible set of parameters behind it. I was expecting to find clusters of parameters very much alike to what they observe.
I informally call this problem inverting an LLM, and obv., it turns out to be non-trivial to solve. Not completely impossible, tho! as so far I've found some good approximations to it.
Anyway, quite an interesting read, def. will keep an eye on what they publish in the future.
Also, from the linked manuscript at the end,
>Another hypothesis is that some features are actually higher-dimensional feature manifolds which dictionary learning is approximating.
Well, you have something that behaves like a continuous, smooth space so you could define as many manifolds as you'd need to suit your needs, so yes :^). But, pedantry off, I get the idea and IMO that's definitely what's going on and the right framework to approach this problem from.
One amazing realization one can get from this is, what is the conceptual equivalent of the transition functions that connect all different manifolds in this LLM space? When you see it your mind will be blown, not because of its complexity, but rather because of its exceptional simplicity.
by DennisP on 10/8/23, 1:42 AM
But if this technique scales up, then Anthropic has fixed that. They can figure out what different groups of neurons are actually doing, and use that to control the LLM's behavior. That could help with preventing accidentally misaligned AIs.
by zyxin on 10/8/23, 3:17 AM
by ilaksh on 10/8/23, 1:55 AM
Because if you can see what each part is doing, then theoretically you can find ways to create just the set of features you want. Or maybe tune features that have redundant capacity or something.
Maybe by studying the features they will get to the point where the knowledge can be distilled into something more like a very rich and finely defined knowledge graph.
by r3trohack3r on 10/8/23, 1:32 AM
That LLMs are capable of what they are at the compute density they are strongly signals to me that the task of making a productive knowledge worker is in overhang territory.
The missing piece isn’t LLM advancement, it’s LLM management.
Building trust in an inwardly-adversarial LLM org chart that reports to you.
by ffwd on 10/8/23, 3:52 PM
by dartos on 10/8/23, 1:21 AM
All these LLMs appear to be converging around these features.
by gorgoiler on 10/8/23, 7:12 AM
This research (and it’s parent and sibling papers, from the LW article) seem to be about picking out those colored graph components from the floating point soup?
by startupsfail on 10/8/23, 2:46 PM
edit: ah, looked at the paper, they did it unsupervised, with a sparse autoencoder.
by rewmie on 10/8/23, 10:47 AM
This is even less surprising given LLMs are applied to models with a known hierarchical structure and symmetry.
Can anyone say exactly what's novel in these findings? From a layman's point of view, this sounds like announcing the invention of gunpowder.
by adamnemecek on 10/8/23, 4:17 AM
"In physics, wherever there is a linear system with a "superposition principle", a convolution operation makes an appearance."
I'm working this out in more details but it is uncanny how much it works out.
I have a discord if you want to discuss this further
by jll29 on 10/8/23, 12:58 PM
by zb3 on 10/8/23, 11:05 AM
by noduerme on 10/8/23, 4:15 AM
I don't see anything wildly different now, other than scale and youth and the hubris that accompanies those things.