Introduction to Cryptography

Lecturer: Professor Fred B. Schneider

Lecture notes by Lynette I. Millett


Introduction to Cryptography

The term cryptography comes from the Greek, and means hidden or secret writing. Cryptography includes: Cryptography involves converting plaintext into ciphertext through a process known as encryption. Ciphertext is converted back to plaintext by decryption. Usually the algorithms are public, but an input, called the key, is secret. Note that the key for encryption does not necessarily have to be the same as the key for decryption.

Some terminology:

Today we will discuss a few primitive cryptographic systems in order to build intuition about cryptoanalysis.

Monoalphabetic Substitution Cipher

The first scheme is called a monoalphabetic substitution cipher. An example is the Caeser cipher. Here, for a given letter in the message, shift to the right (in the alphabet) by three. That is, an 'a' becomes 'd', 'b' becomes 'e' and so on. This can be generalized to work for any n not greater than 25 (assuming a 26 letter alphabet). In this cipher, the number n is the 'key.'

Another--somewhat stronger, cryptographically--example of a monoalphabetic substitution cipher is to use an arbitrary permutation of the alphabet, rather than shifting by a certain number. Rather than only 25 possible keys, we have 26! (26 factorial, the number of permutations of the alphabet, assuming a 26 letter alphabet.) A brute force approach to cracking this cipher, even if one spends only 1 millisecond per permutation, would take roughly 10 trillion years. Note: two successive encryptions do not increase the strength of this cryptosystem because the product of two permutations (keys) is another permutation (key).

In fact, 10 trillion years are not needed to crack a monoalphabetic code. We can take advantage of the fact that sentences in natural language (for the sake of argument, assume English) do not have uniform frequency distributions over the entire alphabet. Thus, we can calculate the relative frequency per character in the cipher text and compare it to what we know about the English language. (For example, 'e' is the most frequent letter, occurring 14% of the time, 't' occurs 9.85%, and so on.) Monoalphabetic substitution ciphers preserve this distribution. Therefore, if we know the original language, and we know that a monoalphabetic substitution was used, then we have a good chance of cracking the code. This kind of attack is referred to as a ciphertext only attack.

Cryptoanalytic Attacks

There are several types of cryptoanalytic attacks:

Polyalphabetic Substitution Cipher

The problem with monoalphabetic substitution ciphers is that the preservation of alphabet distributions makes them vulnerable to frequency-based attacks. We would like a scheme that encrypts plaintext (manifesting a particular distribution) into ciphertext that has a smooth distribution. Therefore, instead of mapping a letter to a fixed symbol, we will map both high and low frequency symbols to the same symbol by using a different permutation of the alphabet for different character positions in the message. This is accomplished through a Vigenere Tableaux: a list of possible permutations of the alphabet. The key is a sequence of lines in the tableaux. If there are four permutations in the tableaux, then the first character in the message is substituted based on the first line in the table, the second character by the second line, the third by the third line, the fourth by the fourth line, the fifth character by the first line and so on in a cyclical fashion.

How can such a substitution be cracked? Note that if we know the key length (how many different permutations are used), then, in the above example we would know that the first, fifth, ninth, etc. characters were encrypted under the same permutation. Thus, knowing that the key length is n reduces the problem to cracking n monoalphabetic substitution ciphers. Cryptoanalysis of a polyalphabetic substitution cipher is therefore accomplished in three steps:

We present Kasiski's method for determining the key length. In a substitution cipher, patterns in the plaintext will manifest themselves in ciphertext. For instance, the digram 'th' occurs frequently in English. In a polyalphabetic substitution, it is likely that 'th' will be permuted using permutations 1 and 2 at multiple points. If the message is long enough, this will happen repeatedly and we can look for these repeated patterns. Kasiski's method works on trigrams (or more) as follows: Once a probable key length has been found, rather than attempting to decode the entire message, one can quickly check and see whether the result of partitioning the message according to that key length has the same kind of frequency distribution that English has.

Transposition Ciphers

The problem the Kasiski method exposes is that with substitution ciphers the information in the message does not get 'spread out' enough. That is, the trigram 'the' is still a trigram in the ciphertext (albeit encoded.) One easy scheme to accomplish this 'spreading' is by using transposition. In this scheme, the message is written out in rows, but the columns are written down. For example becomes This distributes the n-grams, and once the transposition is done a substitution can also be performed. To attack such a transposition, one needs to determine the dimension of the transposition matrix (in the above example: 10x3). This adds one more level of complexity, but is not fundamentally different than the other schemes we have examined. Spreading information through a message like this is called diffusion.

Perfect Substitution Cipher

What are the characteristics of a perfect substitution cipher? We desire perfect secrecy which we define as: The probability that a message can be decrypted is unaltered by knowledge of the ciphertext for that message. A perfect substitution cipher would therefore destroy all n-gram frequencies. This can be accomplished with a one-time pad, an infinite sequence of random bits. This sequence is XORed with the message. If it is a random sequence, then knowing any one bit will tell you nothing about what the next bit will be. Moreover, decoding is simple because (key XOR message) XOR key = message. The problem is that unlimited key material is needed, and absolute synchrony is necessary between the sender and the receiver.

There are two properties of an encryption scheme that are desirable: