by Kerrick on 4/5/25, 3:51 AM with 364 comments
by x187463 on 4/8/25, 10:07 AM
On another note, and perhaps others are feeling similarly, but I am finding myself surprised at how little use I have for this stuff, LLMs included. If, ten years ago, you told me I would have access to tools like this, I'm sure I would have responded with a never ending stream of ideas and excitement. But now that they're here, I just sort of poke at it for a minute and carry on with my day.
Maybe it's the unreliability on all fronts, I don't know. I ask a lot of programming questions and appreciate some of the autocomplete in vscode, but I know I'm not anywhere close to taking full advantage of what these systems can do.
by card_zero on 4/8/25, 10:22 AM
* The weird-ass basket decoration on the table originally has some big chain links (maybe anchor chain, to keep the theme with the beach painting). By the third version, they're leathery and are merging with the basket.
* The candelabra light on the wall, with branch decorations, turns into a sort of skinny minimalist gold stag head, and then just a branch.
* The small table in the background gradually loses one of its three legs, and ends up defying gravity.
* The freaky green lamps in the window become at first more regular, then turn into topiary.
* Making the carpet less faded turns up the saturation on everything else, too, including the wood the table is made from.
by nowittyusername on 4/8/25, 4:04 PM
by probably_wrong on 4/8/25, 10:39 AM
I have to disagree with the conclusion. This was an important discussion to have two to three years ago, then we had it online, and then we more or less agreed that it's unfair for artists to have their works sucked up with no recourse.
What the post should say is "we know that this is unfair to artists, but the tech companies are making too much money from them and we have no way to force them to change".
by shubhamjain on 4/8/25, 11:32 AM
4o is the first image generation model that feels genuinely useful not just for pretty things. It can produce comics, app designs, UI mockups, storyboards, marketing assets, and so on. I saw someone make a multi-panel comic with it with consistent characters. Obviously, it's not perfect. But just getting there 90% is a game changer.
by gcanyon on 4/8/25, 11:43 AM
As I've argued in the past, I think copyright should last maybe five years: in this modern era, monetizing your work doesn't (usually) have to take more than a short time. I'd happily concede to some sort of renewal process to extend that period, especially if some monetization method is in process. Or some sort of mechanical rights process to replace the "public domain" phase early on. Or something -- I haven't thought about it that deeply.
So thinking about that in this process: everyone is "ghiblifying" things. Studio Ghibli has been around for very nearly 40 years, and their "style" was well established over 35 years ago. To me, that (should) make(s) it fair game.
The underlying assumption, I think, is that all the "starving" artists are being ripped off, but are they? Let's consider the numbers -- there are a handful of large-scale artists whose work is obviously replicable: Ghibli, the Simpsons, Pixar, etc. None of them is going hungry because a machine model can render a prom pic in their style. Then you get the other 99.999% of artists, all of whose work went into the model. They will be hurt, but not specifically because their style has been ingested and people want to replicate their style.
Rather, they will be hurt because no one knows their style, nor cares about it; people just want to be able to say e.g. "Make a charcoal illustration of me in this photo, but make me sitting on a horse in the mountains."
It's very much like the arguments about piracy in the past: 99.99% of people were never going to pay an artist to create that charcoal sketch. The 0.01% who might are arguably causing harm to the artist(s) by not using them to create that thing, but the rest were never going to pay for it in the first place.
All to say it's complicated, and obviously things are changing dramatically, but it's difficult to make the argument that "artists need to be compensated for their work being used to train the model" without both a reasonable plan for how that might be done, and a better-supported argument for why.
by haswell on 4/8/25, 12:04 PM
Unfortunately I think the answer to this question is a resounding “no”.
The time for thoughtful shaping was a few years ago. It feels like we’re hurtling toward a future where instead we’ll be left picking up the pieces and assessing the damage.
These tools are impressive and will undoubtedly unlock new possibilities for existing artists and for people who are otherwise unable to create art.
But I think it’s going to be a rough ride, and whatever new equilibrium we reach will be the result of much turmoil.
Employment for artists won’t disappear, but certain segments of the market will just use AI because it’s faster, cheaper, and doesn’t require time consuming iterations and communication of vision. The results will be “good enough” for many.
I say this as someone who has found these tools incredibly helpful for thinking. I have aphantasia, and my ability to visualize via AI is pretty remarkable. But I can’t bring myself to actually publish these visualizations. A growing number of blogs and YouTube channels don’t share these qualms and every time I encounter them in the wild I feel an “ick”. It’ll be interesting to see if more people develop this feeling.
by justinator on 4/8/25, 4:11 PM
https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_...
(nice URL btw)
The room, the door, the ceiling are all of a scale to fit many sizes of elephants.
by m4thfr34k on 4/8/25, 8:18 PM
by Retr0id on 4/8/25, 10:14 AM
"in multimodal image generation, images are created in the same way that LLMs create text, a token at a time"
Is there some way to visualise these "image tokens", in the same way I can view tokenized text?
by NitpickLawyer on 4/8/25, 10:24 AM
I like to look at how far we've come since the early days of Stable Diffusion. It was fascinating to play with it back then, but it quickly became apparent that it was "generic" and not suited for "real work" because it lacked consistency, text capabilities, fingers! and so on... Looking at these results now, I'm amazed at the quality, consistency and ease of use. Gone are the days of doing alchemy on words and adding a bunch of "in the style of Rutkovsky, golden hour, hd, 4k, pretty please ..." at the end of prompts.
by smusamashah on 4/8/25, 3:43 PM
I like the book, but there are quite a few scenes which are quite hard to visualize and make sense. An image generator that can follow that language and detail will be amazing. Even more awesome will be if it remains consistent in follow ups.
by orbital-decay on 4/8/25, 11:27 AM
It's "just" a much bigger and much better trained model. Which is a quality on its own, absolutely no doubt about that. Fundamentally the issue is still there though, just less prominent. Which kind of makes sense - imagine the prompt "not green", what even is that? It's likely slightly out of distribution and requires representing a more complex abstraction, so the accuracy will necessarily be worse than stating the range of colors directly. The result might be accurate, until the model is confused/misdirected by something else, and suddenly it's not.
I think in the end none of the architectural differences will matter beyond the scaling. What will matter a lot more is data diversity and training quality.
by ziofill on 4/8/25, 4:36 PM
by hansmayer on 4/8/25, 3:20 PM
by xnorswap on 4/8/25, 3:11 PM
Feed is in quotes because my feed seems to be 90% suggested posts.
by morkalork on 4/8/25, 1:29 PM
by Zr01 on 4/9/25, 11:17 AM
by lou1306 on 4/8/25, 4:52 PM
by eapriv on 4/8/25, 12:19 PM
by klik99 on 4/8/25, 5:07 PM
My understanding is it’s a meta-LLM approach, using multiple models and having them interact. I feel like it’s also evidence that OpenAI is not seriously pursuing AGI (just my opinion, I know there’s some on here who would aggressively disagree), but rather market use cases. It feels like an acceptance that any given model, at least now, has its own limitations but can get more useful in combination.
by qiqitori on 4/8/25, 11:45 AM
Gave it another chance now, explicitly calling out the numbers. Well, they are improved but not sure how useful this result is (the spacing between numbers is a little off and there's still some curious counting going on. Maybe it kind of looks like the numbers are pasted in after the fact?
https://chatgpt.com/share/67f4fa33-70dc-8012-8e1e-2dea563d3d...
by cadamsdotcom on 4/8/25, 4:56 PM
Wonderful to be alive for these step changes in human capability.
by vunderba on 4/8/25, 2:54 PM
by rkharsan64 on 4/8/25, 11:10 AM
by roenxi on 4/8/25, 9:55 AM
Which isn't a small thing, humour is an advanced soft skill.
by swframe2 on 4/9/25, 5:12 AM
Basically, the user's image prompt is converted to a several prompts to generate parts of the final image in layers which are combined. The layers are still available so that edits can cleanly update one section without affecting the others.
by lupusreal on 4/8/25, 4:31 PM
To me, this kind of image generation isn't very interesting for creating final products, but is extremely useful for communicating design intent to other people when collaborating on large creative projects. Previously I used crude "ms paint" sketches for this, which was much more tedious and less effective.
by thrance on 4/8/25, 11:40 AM
by DonHopkins on 4/8/25, 4:51 PM
A: Your face is pressed up against the ceiling!
by mrconter11 on 4/9/25, 10:03 AM
by freeamz on 4/8/25, 11:33 AM
by Der_Einzige on 4/8/25, 3:34 PM
We get Stable Diffusion V1.5 and SDXL and what does the community go do with it? Lmao see civit.ai and it's literal hundreds of thousands of NSFW loras. The most popular model today on that website is the NSFW anime version of SDXL, called "Pony Diffusion" (I'm literally not making this up. A bunch of Bronies made this model!)
Imagine that an open source image generator which does tokens autoregressively like this at this quality is released.
The world is simply not ready for the amount of horny stuff that is going to be produced (especially without consent). It appears that the male libido really is the reason for most bad things in the world. We are truly the "villains of history".
by NiloCK on 4/8/25, 12:27 PM
by d4rkp4ttern on 4/8/25, 11:30 AM
by globnomulous on 4/9/25, 5:00 PM
In other words, people who care about money and only money are pushing for these tools because they're convinced they'll reduce labor costs and somehow also improve the resulting product, while engineers and creative professionals who have these tools foisted upon them by unimaginative business people continue to insist that the tools are a solution in search of a problem, that they're stochastic parrots and plagiarism automata that bypass all of the important parts of engineering and creativity and make the absolutely, breathtakingly idiotic mistake of supposing it's possible to leap to a finished product without all the work and problem solving involved in getting there.
> The line between human and AI creation will continue to blur
This is utter nonsense, and hype-man prognosticators in the tech world like the author of the article turn out pretty much 100% of the time to be either grifters or saps who have fallen for the grifters' nonsense.
by 1970-01-01 on 4/8/25, 6:03 PM
by ge96 on 4/8/25, 2:57 PM