Compression before encryption

If you’re going to be compressing and encrypting some data, you should do the compression first. Why? There are several reasons:

  • Compressing it last won’t reduce the file size much. Good encryption should make any input data (especially redundant data) appear random. But compression works by removing redundancy, and doesn’t work well on random data. You can see a good example of this here, where encrypting a file and then compressing it actually made it larger than the original!
  • Compressing it should decrease the effectiveness of some attacks. Compression works by reducing the redundancy in the data. A common cryptanalysis method is frequency analysis, which relies on finding repeated data. Compressing it should reduce its effectiveness!
  • Brute force attacks will take longer. Brute force attacks work by trying various keys and decrypting the data and checking if the output data makes any sense. By compressing it first, an attacker must decrypt the data and then decompress it before seeing if the output data makes any sense. This takes much longer, and if an attacker doesn’t know you’re compressing the data at all, they might never break the encryption.

I wanted to see how effective the third point was, so I wrote a Python script that encrypted a short message and used a brute force attack to break it. Then I repeated the experiment, but compressed it using gzip before encrypting it. Here’s how long it took on average, in seconds, to guess a single password:

Password length: Zipped: Not zipped:
2 letters 0.021 0.002
3 letters 0.546 0.061
4 letters 13.612 1.551

As you can see, compressing it before encrypting it took about 9 times as long to break.

Details: A short message was chosen, specifically, “a message“, to encrypt. Because gzip is a block compression algorithm, an attacker only needs to decompress the first bytes rather than the whole file, so I wanted to keep the message short to simulate this. I used 128-bit AES, using a password with only lower case letters. In each iteration, a random password was chosen and both the zipped and unzipped versions were tested. The test was run 1000 times for each password length.

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Compression before encryption

  1. pajuad says:

    Great Topic ^0^

  2. kenmacdKenny says:

    There is only one reason to compress the data, and that is to have a smaller result. With modern ciphers it won’t help in frequency attacks, or brute force attacks. It can also result in an information leak if the attacker knows or can guess the original message size. For example if the plaintext data is 5 characters than ‘aaaaa’ is going to be a smaller ciphertext than ‘abcde’.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s