from Hacker News

Image-to-Image Translation with Conditional Adversarial Nets

by cruisestacy on 11/22/16, 12:13 PM with 56 comments

  • by jawns on 11/22/16, 1:09 PM

    The "sketches to handbags" example, which is buried toward the bottom, is really cool. It's basically an extension of the "edges to handbags," but with hand-drawn sketches.

    Even though the sketches are fairly crude, with no shading and a low level of detail, many of the generated images look like they could, in fact, be real handbags. They still have the mark of a generated image (e.g. weird mottling) but they're totally recognizable as the thing they're meant to be.

    The "sketches to shoes" example, on the other hand, reveals some of the limitations. Most of the sketches use poor perspective, so they wouldn't match up well with edges detected from an actual image of a shoe. Our brains can "get the gist" of the sketches and perform some perspective translation, but the algorithm doesn't appear to perform any translation of the input (e.g. "here's a sketch that appears to represent a shoe, here's what a shoe is actually shaped like, let's fit to that shape before going any further"), so you end up with images where a shoe-like texture is applied to something that doesn't look convincingly like a real shoe.

  • by aexaey on 11/22/16, 8:44 PM

    Truly impressive overall. Unfortunately, it looks like training set was way too small. Look for example at reconstruction of #13 here:

    https://phillipi.github.io/pix2pix/images/index_facades2_los...

    Notice white triangles (image crop artifacts) present on the original image, yet completely absent on the net input image. They make re-appearance on the output of 3 (4 even?) out of 5 nets despite the lack of corresponding cue in the input image. Looks like network cheated a bit here, i.e. took advantage of small set size and memorized the input image as a whole. Then recognized and recalled this very image (already seen during training) rather than actually reconstructing it purely from the input.

    Same (but less prominent) for other images where "ground truth" image was cropped.

  • by mshenfield on 11/22/16, 3:34 PM

    Just want to throw out that none of these applications are new. What is novel about their approach is that, instead of learning a mapping function using a hand-picked function to quantify accuracy for each problem, they also have a mechanism for choosing the function that quantifies accuracy. Haven't grokked the paper to see how they do it, but that is pretty neat IMO.
  • by ragebol on 11/22/16, 1:24 PM

    Interesting.

    What I like about the "Day to Night" example is that is clearly demonstrates that these sort of networks lack common sense. It expects light to be where they are clearly (to humans with common sense at least) no things that can produce light. E.g. in the middle of a roof or in a tree. Of course, there can be, but it's fairly uncommon.

    And the opposite as well, no lights where a human would totally expect a light, eg. in the front of buildings or on the top of, well, lighting poles.

  • by sebleon on 11/22/16, 1:04 PM

    This is awesome!

    Makes me wonder how this can apply to image and video compression. You could send over the semantic segmentation version of an image or video, and system on the other end would use these technique to reconstruct the original.

  • by verytrivial on 11/22/16, 3:23 PM

    Does anyone else have the feeling that with the current trajectory, something exactly like this, but with perhaps a million times the amount of feedback and data, thought will just emerge? Yes, this is all 2D and abstract/selective training sets etc, but what if AI is the ultimate fake-it-until-you-make-it?
  • by willcodeforfoo on 11/22/16, 2:55 PM

    The Aerial-to-Map example looks like this may be useful for automatic map/satellite rectification/georeferencing, but not sure how efficient it'd be if it has to compare against a large area.

    Does anyone have any experience in this area?

  • by bflesch on 11/22/16, 1:09 PM

    I feel this can potentially revolutionize creative processes, for example in the clothing industry. You just draw up a purse or a shoe, let the machines generate dozens of variants (with pictures), and then you only have to filter and rank them.

    You can pipe these product sketches directly into focus groups who tell you which product is most likely to sell. You don't need massive staff to come up with product variants any more.

  • by iraphael on 11/22/16, 1:57 PM

    Besides a cool new application of GANNs, I don't see if this architecture is much different than normal GANNs. Anyone else have thoughts?
  • by amelius on 11/22/16, 12:48 PM

    I wonder how well this scales to a larger domain of interest. So, e.g., if the neural net needs to know not only about cars and nature, but about more topics such as people, faces, computers, gastronomy, santa claus, halloween, etcetera, how does the neural net scale? And how should its topology be extended under such scaling?
  • by romaniv on 11/22/16, 6:00 PM

    Kudos for providing proper examples of the network doing its thing, both good and bad. This is what all researched ought to do. Too many papers these days handpick a couple coolest looking results and stop at that.

    ...

    I get a feeling this could be used in game design to do some really cool stuff with map and texture generation.

  • by rosstex on 11/23/16, 1:03 AM

    I'm enrolled in Efros' computational photography course this semester, and Tinghui and Jun-Yan are the GSIs. It's fantastic to experience the bridge between teaching and cutting-edge research!
  • by mmastrac on 11/22/16, 4:13 PM

    This is an absolutely incredible result. All of this stuff would be considered insanely advanced AI ten years ago, but now we look at it and say "this is just stuff computers can do".

    We've got the pieces of visual processing and imagination here and the pieces of language input/output as part of Google's work. It feels like we just need to make some progress on an "AI executive" before we can get a real, interactive, human-like machine.

  • by hanoz on 11/22/16, 3:43 PM

    I'm interested in having a play. As an out and out ML newbie, is there such a thing as an AWS image I could run on a GPU instance and then just git clone and go?
  • by oluckyman on 11/22/16, 9:45 PM

    Neural nets! Is there anything they can't do?