from Hacker News

Why I find diffusion models interesting?

by whoami_nr on 3/6/25, 10:35 PM with 86 comments

by mountainriver on 3/7/25, 1:57 AM
The most interesting thing about diffusion LMs that tends to be missed, are their ability to edit early tokens.
We know that the early tokens in an autoregressive sequence disproportionately bias the outcome. I would go as far as to say this is some of the magic of reasoning models is they generate so much text they can kinda get around this.
However, diffusion seems like a much better way to solve this problem.
by vinkelhake on 3/7/25, 12:22 AM
I don't get where the author is coming from with the idea that a diffusion based LLM would hallucinate less.
> dLLMs can generate certain important portions first, validate it, and then continue the rest of the generation.
If you pause the animation in the linked tweet (not the one on the page), you can see that the intermediate versions are full of, well, baloney.
(and anyone who has messed around with diffusion based image generation knows the models are perfectly happy to hallucinate).
by kelseyfrog on 3/7/25, 1:35 AM
I'm personally happy to see effort in this space simply because I think it's an interesting set of tradeoffs (compute ∝ accuracy) - a departure from the fixed next token compute budget required now.
It brings up interesting questions, like what's the equivalency between smaller diffusion models which consume more compute because they have a greater number of diffusion steps compared to larger traditional LLMs which essentially have a single step. How effective is decoupling the context window size to the diffusion window size? Is there an optimum ratio?
by prometheus76 on 3/7/25, 2:59 PM
Why did the person who posted this change the headline of the article ("Diffusion models are interesting") into a nonsensical question?
by antirez on 3/7/25, 10:26 AM
There is a disproportionate skepticism in autoregressive models and a disproportionate optimism in alternative paradigms because of the absolutely non verifiable idea that LLMs, when predicting the next token, don't already model, in the activation states, the gist of what they could going to say, similar to what humans do. That's funny because many times it can be observed in the output of truly high quality replies that the first tokens only made sense in the perspective of what comes later.
by kazinator on 3/7/25, 3:16 AM
Interestingly, that animation at the end mainly proceeds from left to right, with just some occasional exceptions.
So I followed the link, and gave the model this bit of conversation starter:
> You still go mostly left to right.
The denoising animation it generated went like this:
> [Yes] [.] [MASK] [MASK] [MASK] ... [MASK]
and proceeded by deletion of the mask elements on the right one by one, leaving just the "Yes.".
:)
by gdiamos on 3/7/25, 12:31 AM
I think these models would get interesting at extreme scale. Generate a novel in 40 iterations on a rack of GPUs.
At some point in the future, you will be able to autogen a 10M line codebase in a few seconds on a giant GPU cluster.
by jacobn on 3/7/25, 12:08 AM
The animation on the page looks an awful lot like autoregressive inference in that virtually all of the tokens are predicted in order? But I guess it doesn't have to do that in the general case?
by DeathArrow on 3/7/25, 5:43 AM
That got me thinking that it would be nice to have something like ComfyUi to work with diffusion based LLMs. Apply LORAs, use multiple inputs, have multiple outputs.
Something akin to ComfyUi but for LLMs would open up a world of possibilities.
by mistrial9 on 3/6/25, 11:51 PM
this is the huggingface page https://huggingface.co/papers/2502.09992
by chw9e on 3/7/25, 7:39 AM
This was a very cool paper about using diffusion language models and beam search: https://arxiv.org/html/2405.20519v1
Just looking at all of the amazing tools and workflows that people have made with ComfyUI and stuff makes me wonder what we could do with diffusion LMs. It seems diffusion models are much more easily hackable than LLMs.
by inverted_flag on 3/7/25, 4:52 PM
How do diffusion LLMs decide how long the output should be? Normal LLMs generate a stop token and then halt. Do diffusion LLMs just output a fixed block of tokens and truncate the output that comes after a stop token?
by alexmolas on 3/7/25, 9:39 AM
I guess the biggest limitation of this approach is that the max output length is fixed before generation starts. Unlike autoregressive LLM, which can keep generating forever.
by flippyhead on 3/7/25, 2:09 PM
It's a pet peeve of mine to make a statement in the form of a question?
by bilsbie on 3/7/25, 5:19 PM
What if we combine the best of both worlds? What might that look like?
by beeforpork on 3/7/25, 2:13 PM
What it is interesting that the original title is not a question?
by FailMore on 3/7/25, 8:01 AM
Thanks for the post, I’m interested in them too
by monroewalker on 3/7/25, 8:05 AM
See also this recent post about Mercury-Coder from Inception Labs. There's a "diffusion effect" toggle for their chat interface but I have no idea if that's an accurate representation of the model's diffusion process or just some randomly generated characters showing what the diffusion process looks like
https://news.ycombinator.com/item?id=43187518
https://www.inceptionlabs.ai/news
by Philpax on 3/7/25, 12:44 AM
I know the r-word is coming back in vogue, but it was still unpleasant to see it in the middle of an otherwise technical blog post. Ah well.
Diffusion LMs are interesting and I'm looking forward to seeing how they develop, but from playing around with that model, it's GPT-2 level. I suspect it will need to be significantly scaled up before we can meaningfully compare it to the autoregressive paradigm.
by billab995 on 3/7/25, 1:52 AM
Stopped reading at the r word. Do better.