CS 513 System Security -- Introduction to Cryptography

Introduction to Cryptography

Lecturer: Professor Fred B. Schneider

Lecture notes by Lynette I. Millett

Introduction to Cryptography

The term cryptography comes from the Greek, and means hidden or secret writing. Cryptography includes:

techniques for implementing secrecy -- ways to obscure the content of a message from eavesdroppers,
integrity checking -- ways to reassure the recipient that a message was not altered since it was generated by the sender, and
authentication -- ways to verify the identity of the source of the message. There are two classes of applications: verifying the original sender of the message and making sure that the source of a given message in a protocol is the same as the source of a previous message.

Cryptography involves converting plaintext into ciphertext through a process known as encryption. Ciphertext is converted back to plaintext by decryption. Usually the algorithms are public, but an input, called the key, is secret. Note that the key for encryption does not necessarily have to be the same as the key for decryption.

Some terminology:

Cryptographer -- invents codes.
Cryptoanalyst -- breaks codes.
Cryptology -- the study of cryptography (encryption/decryption).

Today we will discuss a few primitive cryptographic systems in order to build intuition about cryptoanalysis.

Monoalphabetic Substitution Cipher

The first scheme is called a monoalphabetic substitution cipher. An example is the Caeser cipher. Here, for a given letter in the message, shift to the right (in the alphabet) by three. That is, an 'a' becomes 'd', 'b' becomes 'e' and so on. This can be generalized to work for any n not greater than 25 (assuming a 26 letter alphabet). In this cipher, the number n is the 'key.'

Another--somewhat stronger, cryptographically--example of a monoalphabetic substitution cipher is to use an arbitrary permutation of the alphabet, rather than shifting by a certain number. Rather than only 25 possible keys, we have 26! (26 factorial, the number of permutations of the alphabet, assuming a 26 letter alphabet.) A brute force approach to cracking this cipher, even if one spends only 1 millisecond per permutation, would take roughly 10 trillion years. Note: two successive encryptions do not increase the strength of this cryptosystem because the product of two permutations (keys) is another permutation (key).

In fact, 10 trillion years are not needed to crack a monoalphabetic code. We can take advantage of the fact that sentences in natural language (for the sake of argument, assume English) do not have uniform frequency distributions over the entire alphabet. Thus, we can calculate the relative frequency per character in the cipher text and compare it to what we know about the English language. (For example, 'e' is the most frequent letter, occurring 14% of the time, 't' occurs 9.85%, and so on.) Monoalphabetic substitution ciphers preserve this distribution. Therefore, if we know the original language, and we know that a monoalphabetic substitution was used, then we have a good chance of cracking the code. This kind of attack is referred to as a ciphertext only attack.

Cryptoanalytic Attacks

There are several types of cryptoanalytic attacks:

Ciphertext only -- attacker has only the ciphertext.
Known plaintext -- attacker has full or partial (e.g. message headers) plaintext, or probable plaintext.
Chosen plaintext -- attacker can get an arbitrary message encoded. Attacker then cracks an encrypted message by guessing the contents and submitting the guess for encryption and comparing. This is especially useful if there are only a small number of possibilities for what the plaintext might be.
Algorithm and ciphertext -- also known as a 'dictionary attack.' Run the algorithm on massive amounts of plaintext and find the one plaintext message that encrypts to the ciphertext you are analyzing. This is the reason behind the 'salt' in UNIX password representations. The password contaminated with 'salt' (a random bit string) and then encrypted before being stored in /etc/passwd. Cracking passwords is now much more difficult than simply encrypting a dictionary once and comparing. It is now necessary to separately encrypt each salt value with the entire dictionary.

Polyalphabetic Substitution Cipher

The problem with monoalphabetic substitution ciphers is that the preservation of alphabet distributions makes them vulnerable to frequency-based attacks. We would like a scheme that encrypts plaintext (manifesting a particular distribution) into ciphertext that has a smooth distribution. Therefore, instead of mapping a letter to a fixed symbol, we will map both high and low frequency symbols to the same symbol by using a different permutation of the alphabet for different character positions in the message. This is accomplished through a Vigenere Tableaux: a list of possible permutations of the alphabet. The key is a sequence of lines in the tableaux. If there are four permutations in the tableaux, then the first character in the message is substituted based on the first line in the table, the second character by the second line, the third by the third line, the fourth by the fourth line, the fifth character by the first line and so on in a cyclical fashion.

How can such a substitution be cracked? Note that if we know the key length (how many different permutations are used), then, in the above example we would know that the first, fifth, ninth, etc. characters were encrypted under the same permutation. Thus, knowing that the key length is n reduces the problem to cracking n monoalphabetic substitution ciphers. Cryptoanalysis of a polyalphabetic substitution cipher is therefore accomplished in three steps:

Determine the key length.
Break ciphertext into separate pieces; one piece per permutation.
Solve each piece as a separate monoalphabetic substitution using frequency distributions.

We present Kasiski's method for determining the key length. In a substitution cipher, patterns in the plaintext will manifest themselves in ciphertext. For instance, the digram 'th' occurs frequently in English. In a polyalphabetic substitution, it is likely that 'th' will be permuted using permutations 1 and 2 at multiple points. If the message is long enough, this will happen repeatedly and we can look for these repeated patterns. Kasiski's method works on trigrams (or more) as follows:

Identify repeated patterns of 3 or more letters.
For each pattern, write down the starting positions for all the instances of the pattern.
Compute the differences between the starting positions of successive instances.
Determine all the factors of these differences.
The key length will be one of the factors that appears often.

Once a probable key length has been found, rather than attempting to decode the entire message, one can quickly check and see whether the result of partitioning the message according to that key length has the same kind of frequency distribution that English has.

Transposition Ciphers

The problem the Kasiski method exposes is that with substitution ciphers the information in the message does not get 'spread out' enough. That is, the trigram 'the' is still a trigram in the ciphertext (albeit encoded.) One easy scheme to accomplish this 'spreading' is by using transposition. In this scheme, the message is written out in rows, but the columns are written down. For example

THIS IS AN
EXAMPLE OF
A MESSAGE

becomes

TEAHX IAMSME PSILSSEA  GAOENF

This distributes the n-grams, and once the transposition is done a substitution can also be performed. To attack such a transposition, one needs to determine the dimension of the transposition matrix (in the above example: 10x3). This adds one more level of complexity, but is not fundamentally different than the other schemes we have examined. Spreading information through a message like this is called diffusion.

Perfect Substitution Cipher

What are the characteristics of a perfect substitution cipher? We desire perfect secrecy which we define as: The probability that a message can be decrypted is unaltered by knowledge of the ciphertext for that message. A perfect substitution cipher would therefore destroy all n-gram frequencies. This can be accomplished with a one-time pad, an infinite sequence of random bits. This sequence is XORed with the message. If it is a random sequence, then knowing any one bit will tell you nothing about what the next bit will be. Moreover, decoding is simple because (key XOR message) XOR key = message. The problem is that unlimited key material is needed, and absolute synchrony is necessary between the sender and the receiver.

There are two properties of an encryption scheme that are desirable:

confusion -- the interceptor should not be able to predict the effect of changing one character in the plaintext on the ciphertext.
diffusion -- changes in the plaintext should affect many parts of the ciphertext. (Substitution and permutation do not exhibit diffusion.)