Introduction to Cryptography
Lecturer: Professor Fred B. Schneider
Lecture notes by
Lynette I. Millett
Introduction to Cryptography
The term cryptography comes from the Greek, and means hidden or secret
writing. Cryptography includes:
- techniques for implementing secrecy -- ways to obscure the content of
a message from eavesdroppers,
- integrity checking -- ways to reassure the recipient that a message
was not altered since it was generated by the sender, and
- authentication -- ways to verify the identity of the source of the
message. There are two classes of applications: verifying the original
sender of the message and making sure that the source of a given
message in a protocol is the same as the source of a previous message.
Cryptography involves converting plaintext into ciphertext through a
process known as encryption. Ciphertext is converted back to plaintext
by decryption. Usually the algorithms are public, but an input, called
the key, is secret. Note that the key for encryption does not
necessarily have to be the same as the key for decryption.
Some terminology:
- Cryptographer -- invents codes.
- Cryptoanalyst -- breaks codes.
- Cryptology -- the study of cryptography (encryption/decryption).
Today we will discuss a few primitive cryptographic systems in order to
build intuition about cryptoanalysis.
Monoalphabetic Substitution Cipher
The first scheme is called a monoalphabetic substitution
cipher. An example is the Caeser cipher. Here, for a given
letter in the message, shift to the right (in the alphabet) by
three. That is, an 'a' becomes 'd', 'b' becomes 'e' and so on. This
can be generalized to work for any n not greater than 25 (assuming a
26 letter alphabet). In this cipher, the number n is the 'key.'
Another--somewhat stronger, cryptographically--example of a
monoalphabetic substitution cipher is to use an arbitrary permutation
of the alphabet, rather than shifting by a certain number. Rather
than only 25 possible keys, we have 26! (26 factorial, the number of
permutations of the alphabet, assuming a 26 letter alphabet.) A brute
force approach to cracking this cipher, even if one spends only 1
millisecond per permutation, would take roughly 10 trillion
years. Note: two successive encryptions do not increase the strength
of this cryptosystem because the product of two permutations (keys) is
another permutation (key).
In fact, 10 trillion years are not needed to crack a
monoalphabetic code. We can take advantage of the fact that sentences
in natural language (for the sake of argument, assume English) do not
have uniform frequency distributions over the entire alphabet. Thus,
we can calculate the relative frequency per character in the cipher
text and compare it to what we know about the English language. (For
example, 'e' is the most frequent letter, occurring 14% of the time,
't' occurs 9.85%, and so on.) Monoalphabetic substitution ciphers
preserve this distribution. Therefore, if we know the original
language, and we know that a monoalphabetic substitution was used,
then we have a good chance of cracking the code. This kind of attack
is referred to as a ciphertext only attack.
Cryptoanalytic Attacks
There are several types of cryptoanalytic attacks:
- Ciphertext only -- attacker has only the ciphertext.
- Known plaintext -- attacker has full or partial (e.g. message headers)
plaintext, or probable plaintext.
- Chosen plaintext -- attacker can get an arbitrary message encoded.
Attacker
then cracks an encrypted message by guessing the contents and submitting
the guess for encryption and comparing. This is especially useful
if there are only a small number of possibilities for what the plaintext
might be.
- Algorithm and ciphertext -- also known as a 'dictionary attack.' Run
the algorithm on massive amounts of plaintext and find the one plaintext
message that encrypts to the ciphertext you are analyzing. This is the
reason behind the 'salt' in UNIX password representations. The password
contaminated with 'salt' (a random bit string) and then encrypted before
being stored in /etc/passwd. Cracking passwords is now much more difficult
than simply encrypting a dictionary once and comparing. It is now necessary
to separately encrypt each salt value with the entire dictionary.
Polyalphabetic Substitution Cipher
The problem with monoalphabetic substitution ciphers is that the
preservation of alphabet distributions makes them vulnerable to
frequency-based attacks. We would like a scheme that encrypts
plaintext (manifesting a particular distribution) into ciphertext that
has a smooth distribution. Therefore, instead of mapping a letter to
a fixed symbol, we will map both high and low frequency symbols to the
same symbol by using a different permutation of the alphabet for
different character positions in the message. This is accomplished
through a Vigenere Tableaux: a list of possible permutations of the
alphabet. The key is a sequence of lines in the tableaux. If there are
four permutations in the tableaux, then the first character in the
message is substituted based on the first line in the table, the
second character by the second line, the third by the third line, the
fourth by the fourth line, the fifth character by the first line and
so on in a cyclical fashion.
How can such a substitution be cracked? Note that if we know the key
length (how many different permutations are used), then, in the
above example we would know that the first, fifth, ninth, etc. characters
were encrypted under the same permutation. Thus, knowing that the key length
is n reduces the problem to cracking n monoalphabetic substitution ciphers.
Cryptoanalysis of a polyalphabetic substitution cipher is therefore
accomplished in three steps:
- Determine the key length.
- Break ciphertext into separate pieces; one piece per permutation.
- Solve each piece as a separate monoalphabetic substitution using
frequency distributions.
We present Kasiski's method for determining the key length. In a
substitution cipher, patterns in the plaintext will manifest
themselves in ciphertext. For instance, the digram 'th' occurs
frequently in English. In a polyalphabetic substitution, it is likely
that 'th' will be permuted using permutations 1 and 2 at multiple
points. If the message is long enough, this will happen repeatedly and
we can look for these repeated patterns. Kasiski's method works on
trigrams (or more) as follows:
- Identify repeated patterns of 3 or more letters.
- For each pattern, write down the starting positions for all the instances
of the pattern.
- Compute the differences between the starting positions of successive
instances.
- Determine all the factors of these differences.
- The key length will be one of the factors that appears often.
Once a probable key length has been found, rather than attempting to
decode the entire message, one can quickly check and see whether the
result of partitioning the message according to that key length has
the same kind of frequency distribution that English has.
Transposition Ciphers
The problem the Kasiski method exposes is that with substitution
ciphers the information in the message does not get 'spread out'
enough. That is, the trigram 'the' is still a trigram in the
ciphertext (albeit encoded.) One easy scheme to accomplish this
'spreading' is by using transposition. In this scheme, the message is
written out in rows, but the columns are written down. For example
THIS IS AN
EXAMPLE OF
A MESSAGE
becomes
TEAHX IAMSME PSILSSEA GAOENF
This distributes the n-grams, and once the transposition is done a
substitution can also be performed. To attack such a transposition,
one needs to determine the dimension of the transposition matrix (in
the above example: 10x3). This adds one more level of complexity, but
is not fundamentally different than the other schemes we have
examined. Spreading information through a message like this is called
diffusion.
Perfect Substitution Cipher
What are the characteristics of a perfect substitution cipher? We
desire perfect secrecy which we define as: The probability
that a message can be decrypted is unaltered by knowledge of the
ciphertext for that message. A perfect substitution cipher would
therefore destroy all n-gram frequencies. This can be accomplished
with a one-time pad, an infinite sequence of random
bits. This sequence is XORed with the message. If it is a random
sequence, then knowing any one bit will tell you nothing about what
the next bit will be. Moreover, decoding is simple because (key XOR
message) XOR key = message. The problem is that unlimited key material
is needed, and absolute synchrony is necessary between the sender and
the receiver.
There are two
properties of an encryption scheme that are desirable:
- confusion -- the interceptor should not be able to predict the
effect of changing one character in the plaintext on the ciphertext.
- diffusion -- changes in the plaintext should affect many parts of
the ciphertext. (Substitution and permutation do not exhibit diffusion.)