by hurrycane on 9/29/16, 5:05 PM with 63 comments
by emcq on 9/29/16, 5:50 PM
When zoomed in, the JPEG artifacts are quite apparent and the RNN produces a much smoother image. However, to my eye when zoomed out the high frequency "noise", particularly in the snout area, looks better in JPEG. The RNN produces a somewhat blurrier image that reminds me of the soft focus effect.
by richard_todd on 9/30/16, 1:44 AM
by starmole on 9/29/16, 6:32 PM
"The next challenge will be besting compression methods derived from video compression codecs, such as WebP (which was derived from VP8 video codec), on large images since they employ tricks such as reusing patches that were already decoded."
Beating block based JPEG with a global algorithm doesn't seem that exciting.
by the8472 on 9/29/16, 8:33 PM
https://my.mixtape.moe/klvzip.png
Static mirror: https://archive.fo/yyozl
by wyldfire on 9/29/16, 7:13 PM
So instead of implementing a DCT on my client I need to implement a neural network? Or are these encoder/decoder steps merely used for the iterative "encoding" process? It seems like the representation of a "GRU" file is different from any other.
by jpambrun on 9/29/16, 7:11 PM
by ilaksh on 9/29/16, 7:49 PM
http://cs.stackexchange.com/questions/22317/does-there-exist...
They basically ripped me a new one said it was a stupid idea and that I shouldnt make suggestions in a question. Then I took the suggestions and details out (but left the basic concept in there) and they gave me a lecture on basics of image compression.
Made me really not want to try to discuss anything with anyone after that.
by ChrisFoster on 9/30/16, 12:37 PM
It seems to me like the data driven approach could greatly outperform hand tuned codecs in terms of compression ratio by using a far more expressive model of the input data. Computational cost and model size is likely to be a lot higher though, unless that's also factored into the optimization problem as a regularization term: if you don't ask for simplicity, you're unlikely to get it!
Lossy codecs like jpeg are optimized to permit the kinds of errors that humans don't find objectionable. However, it's easy to imagine that this is not the right kind of lossyness for some use cases. With a data driven approach, one could imagine optimizing for compression which only looses information irrelevant to a (potentially nonhuman) process consuming the data.
by Houshalter on 9/30/16, 2:06 AM
The idea of using NNs for compression has been around for at least 2 decades. The real issue is that it's ridiculously slow. Performance is a big deal for most applications.
It's also not clear how to handle different resolutions or ratios.
by Lerc on 9/30/16, 12:47 AM
She Simplest is a network with inputs of [X,Y] and outputs of {R,G,B] Where the image is encoded into the network weights. You have to per-image train the network. My guess is it would need large complex images before you could get compression rates comparable to simpler techniques. An example of this can be seen at http://cs.stanford.edu/people/karpathy/convnetjs/demo/image_...
In the same vein, you could encode video as a network of [X,Y,T] --> [R, G, B]. I suspect that would be getting into lifetime of the universe scales of training time to get high quality.
The other way to go is a neural net decoder. The network is trained to generate images from input data, You could theoretically train a network to do a IDCT, so it is also within the bounds of possibility that you could train a better transform that has better quality/compressibility characteristics. This is one network for all possible images.
You can also do hybrids of the above techniques where you train a decoder to handle a class of image and then provide a input bundle.
I think the place where Neural Networks would excel would be as a predictive+delta compression method. Neural networks should be able to predict based upon the context of the parts of the image that have already been decoded.
Imagine a neural network image upscaler that doubled the size of a lower resolution image. If you store a delta map to correct any areas that the upscaler guesses excessively wrong then you have a method to store arbitrary images. Ideally you can roll the delta encoding into the network as well. Rather than just correcting poor guesses, the network could rank possible outputs by likelyhood. The delta map then just picks the correct guess, which if the predictor is good, should result in an extremely compressible delta map.
The principle is broadly similar to the approach to wavelet compression, only with a neural network the network can potentially go "That's an eye/frog/egg/box, I know how this is going to look scaled up"
by concerneduser on 9/29/16, 10:18 PM
by rdtsc on 9/30/16, 2:59 AM
by sevenless on 9/29/16, 9:39 PM
by acd on 9/30/16, 9:47 AM
What if you use uniqueness and eigenface look up table for compression?
by zump on 9/30/16, 6:39 AM
by aligajani on 9/29/16, 7:35 PM
by rasz_pl on 9/30/16, 2:01 PM
by samfisher83 on 9/29/16, 6:46 PM
by joantune on 9/29/16, 8:36 PM