In the first article in this series on the basics of crypto, “Ease Me Into Cryptography Part 1: Buzzwords and Hash Function“, we learned some lingo and talked about the different aspects of hash functions. Remember that hash functions are one-way — we cannot reverse them algorithmically. We talked about why this is useful, however let’s get to something that we can encrypt AND decrypt. In cryptography, we call these ciphers. Just like in the last section, and in true “Explain Like I’m Five” fashion, let’s break this down. What is a cipher? What are symmetric ciphers? How are they useful? Are there any weaknesses?
First, let’s introduce a few more terms that will come into play in this article:
- A key is a piece of information used as input into an encryption or decryption algorithm in addition to the data that needs to be transformed. The data cannot properly be transformed without the proper key data.
- Symmetric Ciphers are a family of ciphers that uses the same key to encrypt as it does to decrypt; these are sometimes referred to as secret key algorithms because if the key is the same on both sides, it needs to be kept secret so that not just anyone can decrypt it.
- Asymmetric Ciphers are a family of ciphers that uses a different key to encrypt than it did to decrypt; these are sometimes referred to as public key algorithms because when the encrypting and decrypting keys are different, that allows for one to be public without compromising the correctness/privacy of decryption.
- Block Size refers to the size of chunks that an algorithm will chop data into in order to process it.
- Padding is a process that adds to data of an improper size (i.e. not a multiple of the block size) in order to make it the proper size to fit into the algorithm (some ciphers require inputs be a multiple of a certain size in order to function properly).
- A leak is a residual clue left by the output of an algorithm. We say cryptographic algorithms leak data, if you can make a determination about the input or the key based on observing the output under certain conditions.
We will come back to all of these terms as we continue diving into cryptography, so don’t worry too much about learning and memorizing! They will stick more as we become comfortable with them.
What is a cipher?
A cipher is a cryptographic algorithm that provides a way to encrypt as well as decrypt data. Remember that hashing does not provide a way to decrypt. Now we are talking! This allows someone to jumble up some info, send it safely without revealing the content, and have someone at the other end un-jumble it to see the original info.
This sounds very useful for preserving privacy! But it’s useless if we can’t control who is able to decrypt the message. Fortunately, there is another input to a cipher – the key! This is the small piece of information that is necessary in order to encrypt and decrypt information with a cipher. Some ciphers require other small details as well such as an initialization vector, but we are not going to focus on those for now.
As we read in the keywords list above, there are two main categories of encryption and each handles keys a little differently. These categories are called symmetric encryption and asymmetric encryption. In symmetric encryption, the key used to encrypt is the SAME as the key used to decrypt. In asymmetric encryption, the key used to encrypt is DIFFERENT from the key used to decrypt. This article is going to focus on symmetric encryption only so that we can fully understand the foundations before making things more complex. So take a wild guess at what’s in store for the next article?
How are ciphers useful?
Because we can encrypt data, share a key, and have another party decrypt it, we have just found ourselves a reliable way to ensure privacy. One commonplace application of symmetric ciphers is password protected files (think tax returns, documents with sensitive or personal information at work, etc.).
Note: It is important to remember that when leveraging symmetric ciphers, we always want to make sure that our key is not easy to guess.
Great. So how do we use these things? With symmetric ciphers, to encrypt data we put the plaintext and the key into the algorithm, tell it which direction we want to go (in this case encrypt), and it gives us the ciphertext. To decrypt and get the original data back, we put our ciphertext and the key into the algorithm, tell it which direction we want to go (in this case decrypt), and it gives us the original plaintext. The output is variable in length depending on the algorithm and the input.
Note: Remember one of the weaknesses of hash functions from the last section… collisions? Because of the variable length outputs, we don’t have to worry about collisions with ciphers like we do when using hash functions.
A couple of well-known symmetric ciphers are AES and 3DES. AES stands for Advanced Encryption Standard and 3DES is also known as TDEA, Triple Data Encryption Algorithm. We will use AES in our examples with python below, since it is widely known.
Note: There is a bit of interesting history here. DES is too weak in its single form because the key length is not very long. It isn’t sufficient to double the key length because of the way the algorithm works, so doubling the key length does not make it more resistant to key-recovery attacks. So instead we triple the key length providing sufficient security. AES is the algorithm Rijndael standardized by NIST in 2001. NIST is the National Institute of Standards and Technology, and they have run a few “contests” where they made a call for submissions of cryptographic algorithms to evaluate and standardize. That is an interesting topic for further study, if you are interested.
Before we give this a shot using Python PyCrypto, there is one more detail that pycrypto will ask us to provide: an initialization vector. The initialization vector (IV) is another piece of input that will go into the encryption algorithm. Its purpose is to add additional variability into the algorithm, so that, if an attacker was eavesdropping and trying to analyze the encrypted data, it would be harder for the attacker to glean any information about the key or input data. In other words, it prevents leakage. For instance, without an IV the only inputs are the key and the data. If the key doesn’t change every time and the data happens to have a few blocks that don’t change either, we can imagine that those blocks would encrypt to the same thing each time. If so, the attacker might be able to figure out some of that consistent data. However, with an IV that changes more regularly, even chunks of data that don’t change will look different in ciphertext. No need to worry too much about these details right now; the inputs we are focusing on are the plaintext and the secret key.
Okay, let’s give this a shot. If you followed along in the previous article about hash functions, you already installed pycrypto. If you didn’t, you’ll need to install that library first. Remember, you’ll soak this in a little better, if you type it in yourself as opposed to copying and pasting.
from Crypto.Cipher import AES
# set up variables
# both need to be 16 long for now...
# (so we don't have to worry about padding)
key = b"testkey,testkey!"
iv = b"randomIVrandomIV"
# initialize a new AES object
aes_encrypt = AES.new(key, AES.MODE_CBC, iv)
# something to encrypt... 32 bytes for now... multiples of 16
plaintext = b"cryptography yay"
ciphertext = aes_encrypt.encrypt(plaintext)
print("AES encrypted data: ")
From the top, we import AES from the Ciphers in Crypto. Next, we set up a key and an initialization vector. To keep this example simple (ish) let’s make sure the length of the key and IV is 16, so that we don’t have to worry about padding. Then, we need to create a new instance of AES. To do this, we need to tell it what to use as the secret key, what mode it should operate in (we will talk a little more about modes later in this article, so hold tight!), and the initialization vector it should use. Now, our AES object is ready for more information, so it can be used! The next thing we have to do is decide on some plaintext data that we want to encrypt. Again, to keep things simple and avoid having to add padding, let’s make sure this is 16 characters long. Now we are ready to encrypt! The next line encrypts our plaintext with all the information we provided it and saves the encrypted data to a variable we conveniently named ‘ciphertext’. When we output this, it prints as unreadable bytes. The probability that the random bytes generated from encrypting our plaintext is readable is low. You can covert these bytes to another data type if you want, but I am going to leave mine as bytes for now.
My output looks like this:
AES encrypted data:
Note: We made sure that our variables were all of a certain length. We know that we did this to cater to the block size of the AES mode we chose, but why did we have to do this ourselves? This seems kind of limiting and tedious, doesn’t it? Some cryptographic libraries will accept any length input and add a little padding automatically if necessary to get the strings to be the proper lengths. The library functions we used above don’t add padding automatically, so we needed to check our sizes ourselves. To do the check would make our code example longer, and there’s no need to make things more difficult at this point.
Encyrpting is done. Great! Now let’s add to our script a little bit to decrypt our data. We can do this simply by creating a new AES object like before with same key and mode, then call decrypt() instead of encrypt(). Simple enough. Let’s add this to the bottom of our python script:
# set up an AES suite with the same key and mode
aes_decrypt = AES.new(key, AES.MODE_CBC, iv)
decrypted = aes_decrypt.decrypt(ciphertext)
print("AES decrypted data: ")
Awesome! My output looks like this:
AES encrypted data:
AES decrypted data:
Feel free to play. Try a different key, IV, or cipher text, and notice that the output changes. Are you able to decrypt properly if you change the decryption key? You can alter the length of these inputs as long as they are multiples of 16. Or you can take on the challenge on your own of adding the check into your own script (See Extra Credit below). Also see if you can spot any patterns in the outputs! (Spoiler: You shouldn’t be able to see anything noticeable here, since we chose a good algorithm, mode, and gave it the proper inputs.)
Speaking of Modes…
Now to talk about modes! I promised we would come back to that. In both symmetric and asymmetric encryption, there are a bunch of different ciphers, and usually they have multiple modes in which they can operate. The different modes are used to get rid of patterns in the ciphertexts which can be a form of leakage if the patterns give discernible details about the plaintext.
My favorite example of this is the difference between AES-ECB and AES-CBC (or other AES modes). AES-ECB (Electronic CodeBook) is a very simple mode, because each block of plaintext maps directly to a block of ciphertext. This makes it simpler to implement, but it can cause some leakage. AES-CBC takes care of this particular problem by chaining together some information from different blocks. Look at the images below (courtesy of Wikipedia on the “Block cipher mode of operation” page) to see a visualization of the leakage in ECB.
|Original image||Encrypted using ECB mode||Modes other than ECB result in pseudo-randomness|
Weaknesses or Disadvantages of Symmetric Ciphers
The issue of key size came up a couple of times above. First in the note about why Triple-DES is a thing as opposed to single-DES or double-DES, then again when we made a note that our keys shouldn’t be easy to guess. Why are these properties (key complexity and key length) important to us? First of all, keys should be long enough that they do not become trivial to guess! Guessing all possible keys is called “brute force” guessing, and this takes more time the longer your key is. Additionally, the more complex our key is, the less likely someone is to guess it by chance. If our key is easily guessed, then we really aren’t achieving the privacy we were hoping to get.
Key size and complexity are always top of mind, so that our symmetric algorithm is effective. But there is another challenge we need to consider. When we talked about symmetric encryption and used a key to encrypt and decrypt some messages, we used AES. Similar things can be done with other algorithms such as 3DES, Blowfish, and Camellia. All of these algorithms require that the two involved parties share the same key. Can you think of any problems with this?
Maybe this isn’t such an issue if you are sending messages to someone you see regularly and can easily pass along a key. But what if you have never met this person, and they live somewhere else? How do you securely share the key, so that you can be sure that only the right person has it? Key sharing is one of the complicated aspects of symmetric-key algorithms. There are ways to do it (some of which leverage asymmetric cryptography), but let’s save that topic for later.
You did it!
Wonderful! We made it to the bottom of our ciphers section. This time around we introduced a lot more complexity than we saw when we walked through hash functions. Now we understand cryptographic algorithms that can both encrypt AND decrypt, and we know what the inputs and outputs are. We also know what considerations are important when choosing a key as well as the big challenge of key-sharing. Throughout the series, we are breaking things into small chunks like “Explain Like I’m Five” taught us to do. One step at a time, we keep adding more information and complexity, and, before we know it, we will be feeling comfortable!
Of course, there are no grades or prizes to give away. Let’s just repeat the old saying that knowledge is its own reward. Since we can’t go into everything there is to know about symmetric ciphers, and we want to move on to asymmetric ciphers next time, here’s a few challenges for you. These should be done only after you have a pretty firm grasp on the above examples.
- We used SHA256 for our hash function and AES for our encryption algorithm. What other options are available from Crypto.Hash and Crypto.Cipher?
- Create at least one additional hash script and one additional cipher script.
- Create a more fully functional “program” that accepts input from the user for key, IV and plaintext.
- Add a length check for the IV.
Good luck! If you have any questions or need some help, use the “Comments Section” at the bottom of this article.
Until next time…
Credit – “Symmetric Cyphers” from The Matrix (Warner Bros.)
Ellie Daw is passionate about diversity in tech and giving back, innovation and experimentation, and bringing security and good user experience together. She loves cryptography, hence her handle @cryptoreo, and has been heavily involved in teaching cryptography to professionals as well as developing cryptography workshops for youth cybersecurity education initiatives. On the technical side, Ellie has spent the last couple of years working as a software engineer on crypto and network protocol libraries. She believes there is always more for her to learn, but understands the importance of letting loose, too! In her free time you can find her exploring new places and trying new foods, or dabbling in any active or creative pastime!Tags: ciphercryptodawhighlightpythontutorial