from Hacker News

Image Compression with Neural Networks

by hurrycane on 9/29/16, 5:05 PM with 63 comments

  • by emcq on 9/29/16, 5:50 PM

    This is pretty neat. But is it just me or does the dog picture look better in JPEG?

    When zoomed in, the JPEG artifacts are quite apparent and the RNN produces a much smoother image. However, to my eye when zoomed out the high frequency "noise", particularly in the snout area, looks better in JPEG. The RNN produces a somewhat blurrier image that reminds me of the soft focus effect.

  • by richard_todd on 9/30/16, 1:44 AM

    jpeg 2000 had about a 20% reduction in size over typical jpeg, while producing virtually no blocking artifacts, 16 years ago[1]. Almost no one uses it, though. Now in 2016 we are using neural networks to get a similar reduction, except the dog's snout looks blurry, and with a process that I assume is much more resource intensive. It's interesting for sure, but if people didn't care about jp2, they would have to be drinking some serious AI Kool-aid to want something like this.

    [1]: https://en.m.wikipedia.org/wiki/JPEG_2000

  • by starmole on 9/29/16, 6:32 PM

    Important quote from the paper:

    "The next challenge will be besting compression methods derived from video compression codecs, such as WebP (which was derived from VP8 video codec), on large images since they employ tricks such as reusing patches that were already decoded."

    Beating block based JPEG with a global algorithm doesn't seem that exciting.

  • by the8472 on 9/29/16, 8:33 PM

    Why does a blog page showing static content do madness like this? I'd think google engineers of all people would know better. The site doesn't even work without javascript from a 3rd-party domain.

    https://my.mixtape.moe/klvzip.png

    Static mirror: https://archive.fo/yyozl

  • by wyldfire on 9/29/16, 7:13 PM

    > Instead of using a DCT to generate a new bit representation like many compression schemes in use today, we train two sets of neural networks - one to create the codes from the image (encoder) and another to create the image from the codes (decoder).

    So instead of implementing a DCT on my client I need to implement a neural network? Or are these encoder/decoder steps merely used for the iterative "encoding" process? It seems like the representation of a "GRU" file is different from any other.

  • by jpambrun on 9/29/16, 7:11 PM

    It's fun and scientifically interesting, but the decoder model is 87MB by itself.
  • by ilaksh on 9/29/16, 7:49 PM

    I asked about the possibility of doing this type of thing on CS Stack Exchange two years ago.

    http://cs.stackexchange.com/questions/22317/does-there-exist...

    They basically ripped me a new one said it was a stupid idea and that I shouldnt make suggestions in a question. Then I took the suggestions and details out (but left the basic concept in there) and they gave me a lecture on basics of image compression.

    Made me really not want to try to discuss anything with anyone after that.

  • by ChrisFoster on 9/30/16, 12:37 PM

    It's quite exciting to see progress on a data driven approach to compression. Any compression program encodes a certain amount of information about the correlations of the input data in the program itself. It's a big engineering task to determine a simple and computationally efficient scheme which models a given type of correlation.

    It seems to me like the data driven approach could greatly outperform hand tuned codecs in terms of compression ratio by using a far more expressive model of the input data. Computational cost and model size is likely to be a lot higher though, unless that's also factored into the optimization problem as a regularization term: if you don't ask for simplicity, you're unlikely to get it!

    Lossy codecs like jpeg are optimized to permit the kinds of errors that humans don't find objectionable. However, it's easy to imagine that this is not the right kind of lossyness for some use cases. With a data driven approach, one could imagine optimizing for compression which only looses information irrelevant to a (potentially nonhuman) process consuming the data.

  • by Houshalter on 9/30/16, 2:06 AM

    This seems so overly complicated, with the RNN learning to do arithmetic coding and image compression all at once. Why not do something like autoencoders to compress the image? Then you need only send a small hidden state. You can compress an image to many fewer bits like that. Then you can clean up the remaining error by sending the smaller Delta, which itself can be compressed, either by the same neural net, or with standard image compression.

    The idea of using NNs for compression has been around for at least 2 decades. The real issue is that it's ridiculously slow. Performance is a big deal for most applications.

    It's also not clear how to handle different resolutions or ratios.

  • by Lerc on 9/30/16, 12:47 AM

    I see there being a number ofpaths for Neural Network compression.

    She Simplest is a network with inputs of [X,Y] and outputs of {R,G,B] Where the image is encoded into the network weights. You have to per-image train the network. My guess is it would need large complex images before you could get compression rates comparable to simpler techniques. An example of this can be seen at http://cs.stanford.edu/people/karpathy/convnetjs/demo/image_...

    In the same vein, you could encode video as a network of [X,Y,T] --> [R, G, B]. I suspect that would be getting into lifetime of the universe scales of training time to get high quality.

    The other way to go is a neural net decoder. The network is trained to generate images from input data, You could theoretically train a network to do a IDCT, so it is also within the bounds of possibility that you could train a better transform that has better quality/compressibility characteristics. This is one network for all possible images.

    You can also do hybrids of the above techniques where you train a decoder to handle a class of image and then provide a input bundle.

    I think the place where Neural Networks would excel would be as a predictive+delta compression method. Neural networks should be able to predict based upon the context of the parts of the image that have already been decoded.

    Imagine a neural network image upscaler that doubled the size of a lower resolution image. If you store a delta map to correct any areas that the upscaler guesses excessively wrong then you have a method to store arbitrary images. Ideally you can roll the delta encoding into the network as well. Rather than just correcting poor guesses, the network could rank possible outputs by likelyhood. The delta map then just picks the correct guess, which if the predictor is good, should result in an extremely compressible delta map.

    The principle is broadly similar to the approach to wavelet compression, only with a neural network the network can potentially go "That's an eye/frog/egg/box, I know how this is going to look scaled up"

  • by concerneduser on 9/29/16, 10:18 PM

    That neural network technology is all fine and good for compressing images of lighthouses and dogs - but what about other things?
  • by rdtsc on 9/30/16, 2:59 AM

    Now that Google is full on the neural network deep learning train with their Tensor Processing Units we'll be seeing NN applied to everything. There was an article about translation now imagine compression. It is a bit amusing, but nothing wrong with it, this is great stuff, I am glad they are sharing all this work.
  • by sevenless on 9/29/16, 9:39 PM

    I've been wondering when neural networks might be able to compress a movie back down to the screenplay.
  • by acd on 9/30/16, 9:47 AM

    Is there any image compression that uses Eigenfaces? Using the fact your face may look similar to someone else face.

    What if you use uniqueness and eigenface look up table for compression?

  • by zump on 9/30/16, 6:39 AM

    Compression engineers shaking in their boots.
  • by aligajani on 9/29/16, 7:35 PM

    I knew this was coming. Great stuff.
  • by rasz_pl on 9/30/16, 2:01 PM

    you could probably reach 20% by building custom quantization table (DQT) per image alone
  • by samfisher83 on 9/29/16, 6:46 PM

    Was this this inspired by silicon valley?
  • by joantune on 9/29/16, 8:36 PM

    Nice!! They should call it PiedPiper :D (I can't believe I was the 1st one with this comment)