from Hacker News

Ask HN: Why are AI generated images so shiny/glossy?

by arduinomancer on 8/16/24, 2:20 AM with 61 comments

I’ve noticed a lot of the time you can tell an image is AI generated because it has a shiny/glossy lighting look to it.

Has anyone figured out why this is the case?

by keiferski on 8/16/24, 3:35 AM
It’s just the typical aesthetic model used and isn’t inherent to the tech itself. It’s very easy to make AI images in specific art styles, with the result that you can’t tell they’re not real.
This is actually something of a pet peeve of mine - people sharing AI images never use styles other than the generic shiny one, and so places like Reddit.com/r/midjourney are filled with the same exact style of images.
Edit: if you’re looking for other style inspiration ideas, this website is a great resource for Midjourney keywords: https://midlibrary.io/styles
by vipshek on 8/16/24, 3:47 AM
Many AI-generated images you encounter are low-effort creations without much prompt tuning, created using something like DALL-E or Llama 3.1. For whatever reason, the default style of DALL-E, Llama 3.1, and base Stable Diffusion seems to lean towards a glossy "photorealism" that people can instantly tell isn't real. By contrast, Midjourney's style is a bit more painted, like the cover of a fantasy novel.
All that being said, it's very possible to prompt these generators to create images in a particular style. I usually include "flat vector art" in image generation prompts to get something less photorealistic that I've found is closer to the style I want when generating images.
If you really want to go down the rabbit hole, click through the styles on this Stable Diffusion model to see the range that's possible with finetuning (the tags like "Watercolor Anime" above the images): https://civitai.com/models/264290/styles-for-pony-diffusion-...
by feverzsj on 8/16/24, 5:18 AM
Maybe because the image is generated from Gaussian noise in diffusion models, while the real photo pixel entropy doesn't distribute like this.
by sidkshatriya on 8/16/24, 3:38 AM
A lot of (non-AI) photos of humans tend to be airbrushed by (human) photo editors -- this removes natural imperfections -- like patchy skin, acne, discolouration etc.
In AI models, I think the pictures the AI generates is biased to generate is also a form of "airbrush" except the model makes the reflectivity of the images high -- simply to hide the fact that there _arent_ any imperfections that would make the photo more realistic.
In other words, gloss is just a form of airbrushing -- AI does it to hide the fact that there are no more details available.
I would guess that AI models could make the airbrush more like the airbrush human photo editors do by changing some hyper-parameters of their models.
by spaceman_2020 on 8/16/24, 5:28 AM
Dall-E at least seems to have adopted the cartoonish style just to avoid lawsuits
You can get realistic images with Midjourney and Flux with minimal prompt tuning. Adding “posted on snapchat” or “security camera footage” to the prompt will often produce mostly realistic looking images
by txnf on 8/16/24, 3:25 AM
there is an "aesthetics" model
https://github.com/LAION-AI/laion-datasets/blob/main/laion-a...
obviously, it reflects the mass preference for glosslop
secondarily it is likely due to a desire to ensure that ai images have a distinct look
by ClassyJacket on 8/16/24, 4:22 AM
I don't know, but I've noticed another pattern: They don't like leaving any empty space. Every area has to be busy, filled with objects. They can never leave any empty grass, or walls, or anything. Everything is full of objects.
by blululu on 8/16/24, 5:43 AM
This is an interesting question, though I think it needs to be qualified a bit since there are many AI images and AI image generators that don't match this pattern.
First, AI Images != OpenAI/ChatGPT Images. OpenAI has done a great job making a product that is accessible and thus their product decisions get a lot more exposure than other options. A few people have commented how there are several Stable Diffusion fine tunings that produce very different styles.
Second, AI Images and images of AI images of people are different. I think that the high gloss style is most pronounced in people. Partly this is because it is more notable and out of place.
If you take the previous two points as being true the question becomes why does ChatGPT image model skew toward generated shiny people. I would venture that is a conscious product decision that has something to do with what someone thought looked the most reliably good given their model's capabilities.
Some wild speculation as to why this would be the case:
* Might have to do with fashion photos having unusually bright lights and various cosmetics to give a sheen.
* It might have something to do with training the model on synthetic data (i.e. 3d models) which will have trouble producing the complicated subsurface scattering of human skin.
* Might have something to do with image statistics and glossy finishes creeping in where they don't belong.
* Might have to do with the efficiency of representing white spots.
by DaoVeles on 8/16/24, 5:41 AM
I suppose because a large part of these models is recognition-probability, the shine is sort of an approximation of what is likely lighting. It isn't just the lighting that you expect but the culmination of thousands of similar yet slightly different. If you where to take a thousand photo's of someone with all manner of light angles, maybe it would look like this. Just a wild guess though.
by latentsea on 8/16/24, 4:44 AM
People have started training Lora's for Flux that look pretty pretty real. This was a good recent example: https://www.reddit.com/r/StableDiffusion/comments/1ero4ts/fi...
by simonw on 8/16/24, 4:03 AM
One of the most interesting things about Midjourney is that it always returns multiple images, and asks the user to select which of those they would like to view at full resolution.
This is pretty clearly training for a preference model - so they now have MILLIONS of votes showing which images are more "pleasing" to their users.
by BobbyTables2 on 8/16/24, 3:27 AM
I naively assumed the “airbrushed” effect AI photos have was just a way of blending components of the training data to make it look normal — opposite the way a collage of magazine clippings would appear.
by disconcision on 8/16/24, 5:48 AM
Intentional choices during data set collation (to some degree 'emergent intention' due to aggregate preference). Search for 'boring realism' to find people working in other regions of latent space, e.g. this LORA: https://civitai.com/models/310571/boring-reality . Most of the example pictures there don't have the shiny/glossy look you're talking about.
by anileated on 8/16/24, 4:39 AM
ML-generated pseudo-photos look 3D-rendered because noise is information, and more information is both more expensive (a noisy photo can be 3x the size at the same resolution) and creates more opportunities for self-inconsistencies (e.g., with real camera sensor noise) that make fakes easier to identify automatically.
by ffhhj on 8/16/24, 3:10 AM
Because that's the kind of image that AI trainers like the most? Would they rather train them on old newspapers?
That would be the "oiled bodybuilder" applied to image training. Maybe similar and clearly defined lighting also allows AI's to match features much better, specially volumes.
by osigurdson on 8/16/24, 4:40 AM
I tend to agree. However, I tried to continue to prompt ChatGPT make make the picture less "AI like" and it actually ended up doing a really good job after 5 or 6 attempts. I'm not sure why it took so much prompting. Further prompting just made it worse.
by t0bia_s on 8/16/24, 12:11 PM
Because models are trained on images that are usually edited in post-production with this aesthetic.
Mostly highlights down, shadows and clarity up. I often needs to edit it back to have realistic looking lights on scene.
Also "--s" 0 helps with generating more realistic images.
by LeoPanthera on 8/16/24, 4:51 AM
AI is compression and compression, of any lossy kind, usually works by removing the high-frequency information first. That applies to both audio and imagery. It's obviously not the only factor, but I bet it's an important one.
by Scrapemist on 8/16/24, 5:51 AM
The noisy pattern of the skin, combined with noise from light hitting the camera sensor, plus the noise created of image compression creates an effect that is too subtle and random to “learn”.
by Blackthorn on 8/16/24, 3:37 AM
Models have Goodharted themselves into oblivion. That's the result of endless cycles of aesthetic preference optimization, training on synthetic data, repeat ad nauseam.
by bronya19c on 8/16/24, 5:02 PM
I believe it's due to AI's limited ability to generate localized texture details effectively, often resorting to the use of highlights as a concealment strategy.
by jerpint on 8/16/24, 4:24 AM
There’s an added objective in some of these models to make them more aesthetically pleasing based on subjective crowdsourced data that very likely contributes to this
by bronya19c on 8/16/24, 5:06 PM
I believe it's due to AI's limited ability to generate localized texture details effectively, often use highlights as a concealment strategy.
by RicoElectrico on 8/16/24, 3:59 PM
I think the generators were trained mostly on ArtStation and this style is quite common in concept art.
by tivert on 8/16/24, 4:11 AM
I don't know what you mean by "shiny/glossy lighting look to it." Could you give some examples?
I haven't noticed that a lot of AI images are generated with a "realistic cartoon" style, and I assume that's to smooth over some uncannyness.
by bni on 8/16/24, 10:53 AM
Also people in them look like caucasian anime characters. Why is that?
by allanren on 8/16/24, 4:09 AM
It has to do with the training data. Mostly are photorealistic style. I'm sure there will be more real world style coming out soon.