from Hacker News

Gemini Diffusion

by og_kalu on 5/20/25, 5:50 PM with 7 comments

  • by heliophobicdude on 5/20/25, 8:19 PM

    I've been let off the waitlist. So far, I'm impressed with the Instant Edits. It's crazy fast. I can provide a big HTML file and prompt it to change a color theme and it makes careful edits to just the relevant parts. It seems to be able to parallelize the same instruction to multiple parts of the input. This is incredible for refactoring.

    I copied a shader toy example, asked it to rename all the variables to be more descriptive and it edited just the variable names. I was able to compile and run in shader toy.

  • by adt on 5/21/25, 7:46 AM

    Good to see some more diffusion models:

    https://lifearchitect.ai/models-table/

  • by gs17 on 5/21/25, 6:27 AM

    It's ludicrously fast, but it's not ludicrously intelligent, so trying their examples simply led to it failing 100x faster than normal Gemini. Still impressed though. It made a nice tic tac toe ish game, except the computer player became a human player after a few moves and couldn't fix it.
  • by minimaxir on 5/20/25, 5:55 PM

    > Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step. This means they can iterate on a solution very quickly and error correct during the generation process. This helps them excel at tasks like editing, including in the context of math and code.

    This is deliberately unhelpful as it begs the question "why hasn't anyone else made a good text diffusion model in the years since the technology has been available?"

    The answer to that question is that unlike latent diffusion for images which can be fuzzy and imprecise before generating the result image, text has discrete outputs and therefore must be more precise, so Google is utilizing some secret sauce to work around that limitation and is keeping it annoyingly close to the chest.