by whoami_nr on 3/6/25, 10:35 PM with 86 comments
by mountainriver on 3/7/25, 1:57 AM
We know that the early tokens in an autoregressive sequence disproportionately bias the outcome. I would go as far as to say this is some of the magic of reasoning models is they generate so much text they can kinda get around this.
However, diffusion seems like a much better way to solve this problem.
by vinkelhake on 3/7/25, 12:22 AM
> dLLMs can generate certain important portions first, validate it, and then continue the rest of the generation.
If you pause the animation in the linked tweet (not the one on the page), you can see that the intermediate versions are full of, well, baloney.
(and anyone who has messed around with diffusion based image generation knows the models are perfectly happy to hallucinate).
by kelseyfrog on 3/7/25, 1:35 AM
It brings up interesting questions, like what's the equivalency between smaller diffusion models which consume more compute because they have a greater number of diffusion steps compared to larger traditional LLMs which essentially have a single step. How effective is decoupling the context window size to the diffusion window size? Is there an optimum ratio?
by prometheus76 on 3/7/25, 2:59 PM
by antirez on 3/7/25, 10:26 AM
by kazinator on 3/7/25, 3:16 AM
So I followed the link, and gave the model this bit of conversation starter:
> You still go mostly left to right.
The denoising animation it generated went like this:
> [Yes] [.] [MASK] [MASK] [MASK] ... [MASK]
and proceeded by deletion of the mask elements on the right one by one, leaving just the "Yes.".
:)
by gdiamos on 3/7/25, 12:31 AM
At some point in the future, you will be able to autogen a 10M line codebase in a few seconds on a giant GPU cluster.
by jacobn on 3/7/25, 12:08 AM
by DeathArrow on 3/7/25, 5:43 AM
Something akin to ComfyUi but for LLMs would open up a world of possibilities.
by mistrial9 on 3/6/25, 11:51 PM
by chw9e on 3/7/25, 7:39 AM
Just looking at all of the amazing tools and workflows that people have made with ComfyUI and stuff makes me wonder what we could do with diffusion LMs. It seems diffusion models are much more easily hackable than LLMs.
by inverted_flag on 3/7/25, 4:52 PM
by alexmolas on 3/7/25, 9:39 AM
by flippyhead on 3/7/25, 2:09 PM
by bilsbie on 3/7/25, 5:19 PM
by beeforpork on 3/7/25, 2:13 PM
by FailMore on 3/7/25, 8:01 AM
by monroewalker on 3/7/25, 8:05 AM
by Philpax on 3/7/25, 12:44 AM
Diffusion LMs are interesting and I'm looking forward to seeing how they develop, but from playing around with that model, it's GPT-2 level. I suspect it will need to be significantly scaled up before we can meaningfully compare it to the autoregressive paradigm.
by billab995 on 3/7/25, 1:52 AM