by cruisestacy on 11/22/16, 12:13 PM with 56 comments
by jawns on 11/22/16, 1:09 PM
Even though the sketches are fairly crude, with no shading and a low level of detail, many of the generated images look like they could, in fact, be real handbags. They still have the mark of a generated image (e.g. weird mottling) but they're totally recognizable as the thing they're meant to be.
The "sketches to shoes" example, on the other hand, reveals some of the limitations. Most of the sketches use poor perspective, so they wouldn't match up well with edges detected from an actual image of a shoe. Our brains can "get the gist" of the sketches and perform some perspective translation, but the algorithm doesn't appear to perform any translation of the input (e.g. "here's a sketch that appears to represent a shoe, here's what a shoe is actually shaped like, let's fit to that shape before going any further"), so you end up with images where a shoe-like texture is applied to something that doesn't look convincingly like a real shoe.
by aexaey on 11/22/16, 8:44 PM
https://phillipi.github.io/pix2pix/images/index_facades2_los...
Notice white triangles (image crop artifacts) present on the original image, yet completely absent on the net input image. They make re-appearance on the output of 3 (4 even?) out of 5 nets despite the lack of corresponding cue in the input image. Looks like network cheated a bit here, i.e. took advantage of small set size and memorized the input image as a whole. Then recognized and recalled this very image (already seen during training) rather than actually reconstructing it purely from the input.
Same (but less prominent) for other images where "ground truth" image was cropped.
by mshenfield on 11/22/16, 3:34 PM
by ragebol on 11/22/16, 1:24 PM
What I like about the "Day to Night" example is that is clearly demonstrates that these sort of networks lack common sense. It expects light to be where they are clearly (to humans with common sense at least) no things that can produce light. E.g. in the middle of a roof or in a tree. Of course, there can be, but it's fairly uncommon.
And the opposite as well, no lights where a human would totally expect a light, eg. in the front of buildings or on the top of, well, lighting poles.
by sebleon on 11/22/16, 1:04 PM
Makes me wonder how this can apply to image and video compression. You could send over the semantic segmentation version of an image or video, and system on the other end would use these technique to reconstruct the original.
by verytrivial on 11/22/16, 3:23 PM
by willcodeforfoo on 11/22/16, 2:55 PM
Does anyone have any experience in this area?
by bflesch on 11/22/16, 1:09 PM
You can pipe these product sketches directly into focus groups who tell you which product is most likely to sell. You don't need massive staff to come up with product variants any more.
by iraphael on 11/22/16, 1:57 PM
by amelius on 11/22/16, 12:48 PM
by romaniv on 11/22/16, 6:00 PM
...
I get a feeling this could be used in game design to do some really cool stuff with map and texture generation.
by rosstex on 11/23/16, 1:03 AM
by mmastrac on 11/22/16, 4:13 PM
We've got the pieces of visual processing and imagination here and the pieces of language input/output as part of Google's work. It feels like we just need to make some progress on an "AI executive" before we can get a real, interactive, human-like machine.
by hanoz on 11/22/16, 3:43 PM
by oluckyman on 11/22/16, 9:45 PM