from Hacker News

“Should you encrypt or compress first?”

by phillmv on 6/28/16, 3:00 PM with 235 comments

by orlp on 6/28/16, 3:30 PM
There's no compress or encrypt _first_.
It's just compress or not, before encrypting. If security is important, the answer to that is no, unless you're an expert and familiar with CRIME and related attacks.
Compression after encryption is useless, as there should be NO recognizable patterns to exploit after the encryption.
by vog on 6/28/16, 3:45 PM
A more interesting question is whether to compress or sign first.
There's an interesting article on that topic by Ted Unangst:
"preauthenticated decryption considered harmful"
http://www.tedunangst.com/flak/post/preauthenticated-decrypt...
EDIT: Although the article talks about encrypt+sign versus sign+encrypt, the same argument goes for compress+sign versus sign+compress. You shouldn't do anything with untrusted data before having checked the signature - neither uncompress nor decrypt nor anything else.
by mjevans on 6/28/16, 4:33 PM
Where everyone seems to be getting confused is handling a live flow versus handling a finalized flow (a file).
* Always pad to combat plain-text attacks, padding in theory shouldn't compress well so there's no point making the compression less effective by processing it.
* Always compress a 'file' first to reduce entropy.
* Always pad-up a live stream, maybe this data is useful in some other way, but you want interactive messages to be of similar size.
* At some place in the above also include a recipient identifier; this should be counted as part of the overhead not part of the padding.
* The signature should be on everything above here (recipients, pad, compressed message, extra pad).
. It might be useful to include the recipients in the un-encrypted portion of the message, but there are also contexts where someone might choose otherwise; an interactive flow would assume both parties knew a key to communicate with each other on and is one such case.
* The pad, message, extra-pad, and signature /must/ be encrypted. The recipients /may/ be encrypted.
I did have to look up the sign / encrypt first question as I didn't have reason to think about it before. In general I've looked to experts in this field for existing solutions, such as OpenPGP (GnuPG being the main implementation). Getting this stuff right is DIFFICULT.
by Animats on 6/28/16, 10:44 PM
This is why military voice encryption sends at a constant bitrate even when you're not talking. For serious security applications where fixed links are used, data is transmitted at a constant rate 24/7, even if the link is mostly idle.
by dietrichepp on 6/29/16, 3:28 AM
Wow, what a trainwreck. So many comments in here talking about whether it would be possible to compress data which looks like uniformly random data, for all the tests you would throw at it. Spoiler alert, you can't compress encrypted data. This isn't a question of whether we know it's possible, rather, it's a fact that we know it's impossible.
In fact, if you successfully compress data after encryption, then the only logical conclusion is that you've found a flaw in the encryption algorithm.
by kinofcain on 6/28/16, 3:52 PM
Also interesting is which compression algorithm you're using. HPACK Header compression in HTTP 2.0 is an attempt to mitigate this problem:
https://http2.github.io/http2-spec/compression.html#Security
by js2 on 6/28/16, 5:00 PM
The paper cited in this article (Phonotactic Reconstruction of Encrypted VoIP Conversations) really deserves to be highlighted, so I submitted it separately:
https://news.ycombinator.com/item?id=11995298
http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf
by tomp on 6/28/16, 3:21 PM
I don't understand... Why couldn't you do CRIME with no compression as well? Assuming you can control (parts of) the plaintext, surely plaintext+encrypt gives you more information than plaintext+compress+encrypt?
by arknave on 6/28/16, 4:36 PM
I picked up on the reference to Stockfighter, but does anyone know if the walking machine learning game mentioned at the end of the article exists? Sounds like a fun game.
by jakozaur on 6/28/16, 3:59 PM
Would adding some tiny random size help? Based on my poorly understanding, if after compress, but before encrypt we add random 0 to 16 bytes or 1% of size that could defeat quite a lot of attacks (like CRIME).
by IncRnd on 6/28/16, 6:11 PM
Despite the question being flawed. The correct answer is a series of questions: Who is the attacker? What are you guarding? What assumptions are there about the operating environment? What invariants (regulations, compliance, etc) exist?
There may be compensating controls that invalidate the perceived needs for encryption or compression, for example. i.e. don't design in the dark.
Of course, the interviewer may just want a canned scripted answer - but the interview is your chance to shine, showing how you can discuss all the angles.
by spatulon on 6/28/16, 6:38 PM
That was a fun read. Do I detect a nod to tptacek's "If You’re Typing the Letters A-E-S Into Your Code You’re Doing It Wrong"?
https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...
by biokoda on 6/28/16, 3:24 PM
If you're compressing audio, the simple solution is to compress using constant bitrate.
by jayd16 on 6/28/16, 5:49 PM
Would be great if Apple understood this and compressed IPA contents before encrypting.
Instead, when you submit something to the AppStore, you end up with a much bigger app than the one you uploaded.
To add insult to injury, if you ask Apple about this fuck up you get an esoteric support email about removing "contiguous zeros." As in, "make your app less compressible so it won't be obvious we're doing this wrong."
by poelzi on 6/28/16, 6:07 PM
if your compression can compress your encrypted data, you should change your encryption mechanism to something that actually works...
by em3rgent0rdr on 6/28/16, 6:37 PM
What if you compress and then only send data at regular periods and regular packet sizes? That way no information can be gleaned. E.g. after compressing you pad the data if it is unusually short, or you include other compressed data too, or you only use constant bit-rate compression algorithm.
by hueving on 6/28/16, 4:50 PM
That quoted voip paper isn't actually as damaging as it sounds. IIRC that 0.6 rating was for less than half of the words so if you're trying to listen to a conversation to get something meaningful, it's probably not going to happen.
by panic on 6/28/16, 3:55 PM
Has there been any research into compression that's generally safe to use before encryption? E.g., matching only common substrings longer than the key length would (I think?) defeat CRIME at the cost of compression ratio.
by Qantourisc on 6/28/16, 9:40 PM
Maybe we need encryption that also plays with the length of the message / or randomly pad our date before encryption ? I am however no expert, so I have no clue how feasible, or full of holes this method would be .
by itsnotvalid on 6/29/16, 5:47 AM
I am always thinking, if the compression scheme is known, you would need some good noonce to avoid known plaintext (for example, compression format's header is always the same), and also by CRIME, which is to remover the dictionary of the compression.
I think it is best to use built-in compression scheme by the compression program to do the encryption first, as those often take these into account (and the header is not leaked, since only the content is encrypted).
by cm2187 on 6/28/16, 7:06 PM
Can't you just add some random length data at the end. You are defeating compression a little bit, but are also making the length non deterministic. I thought pgp did that.
by arielweisberg on 6/28/16, 3:32 PM
So what does this mean if I am using an encrypted SSL connection that is correctly configured?
Is this kind of problem not already dealt with for me by the secure transport layer? It would be a shame if the abstraction were leaky. My understanding of the contract is that whatever bits I supply will be securely transported within the limits of the configuration I have selected.
If I pick a bad configuration then yes shame on me, but a good configuration won't care if I compress right?
by gravypod on 6/28/16, 3:24 PM
Logically speaking, an encrypted file should have a high entropy set of bits within it. Compressing it would be low return, but higher security since the input file contained more "random" bits.
Compressing the source material will yield smaller results but will be more predictable as the file will always contain ZIP headers and other metadata that would possibly make decryption of your file much easier.
by jtolmar on 6/28/16, 5:34 PM
If I compress each component (ie: attacker-influenced vs secret) separately, concatenate the results (with message lengths of course), then encrypt the whole message, is that secure?
It seems like it should be, but I'm not an encryption expert. The compression should be pretty good, though.
by khc on 6/29/16, 1:41 AM
> The paper Phonotactic Reconstruction of Encrypted VoIP Conversations gives a technique for reconstructing speach from an encrypted VoIP call.
The technique to reconstructing speech clearly had its limitations.
by draugadrotten on 6/28/16, 3:32 PM
This blog is an interesting way to advertise to their target market: us.
by gameofdrones on 6/28/16, 8:26 PM
The OP should take https://www.coursera.org/learn/crypto
by kstenerud on 6/28/16, 3:36 PM
So if the length of the resulting message is leaking information, salt it by adding some extra random bits to the end to increase the length by a random amount.
by arjie on 6/28/16, 5:52 PM
None of this seems to apply to documents you generate to supply to someone else you trust. Compress and encrypt seems perfectly fine.
by FuturePromise on 6/28/16, 5:30 PM
Given the real risk of CRIME attacks, are there "compression aware" encryption algorithms?
by justinzollars on 6/28/16, 4:05 PM
tl;dr
by vox_mollis on 6/28/16, 4:06 PM
A lot of comments here suggesting that encryption increases entropy. While true, it only adds the key's entropy to the plaintext's entropy. In most real-world cases, len(m) >> len(k), so this is usually an insignificant increase of entropy. Compression also adds a trivial amount of entropy (specifically, the information encoding the algorithm used to compress, even if that information is out of band).
by usloth_wandows on 6/28/16, 3:23 PM
I thought this was common sense. Compress then encrypt. Encryption leads to higher entropy, therefore less effective compression.