5.2 Sort Bigram Frequencies?

Next: 5.3 Exhaustive Search Up: 5. Attempts at Decryption Previous: 5.1.1 Sorting implies Swapping

5.2 Sort Bigram Frequencies?

Sorting unigram frequencies almost worked, and we have reason to believe using bigram frequencies would work better. Therefore, a natural first reaction is to consider the following:

1.: Sort bigrams in the top line by their frequency in training text.
2.: Sort bigrams in the bottom line by their frequency in ciphertext.
3.: Somehow read off the encryption/decryption key.

Unfortunately, ``sort bigrams by frequency and read off the key'' is problematic. For example, what should we do in the situation pictured in Figure 26?

**Figure 26:** Attempt to Sort Bigrams by Frequency and Read Off the Key
$\begin{figure} \begin{center} \begin{tabular}{l\vert c\vert c\vert} \multicolumn... ...t{''ab''} & \texttt{''cd''} \\ \cline{2-3} \end{tabular}\end{center}\end{figure}$

Matching "th" with "ab" suggests 'h' $\rightarrow$ 'b' (and 't' $\rightarrow$ 'a') is part of the encryption key, whereas matching "he" with "cd" suggests 'h' $\rightarrow$ c is part of the encryption key, which is a contradiction: A key must map each letter to exactly one letter. A key is not allowed to map 'h' to both 'b' and 'c'.

One intuition as to why this approach is problematic is that with sorting unigram frequencies, swapping two frequencies corresponds to swapping two letters. However, with sorting bigram frequencies, swapping two frequencies does not correspond to swapping two letters: Swapping two letters affects a whole set of associated frequencies.

In Section 6.3 we will carefully investigate how swapping two letters affects the table of bigram frequencies. However, for now, we will suppress the details, except to point out that that there are details that must be considered, and rule out ``sort bigram frequencies'' as a simple approach.

Roadmap
Section 3.7	Encipher plaintext $\Rightarrow$ *scramble* frequencies.
Section 3.7	Decipher ciphertext $\Rightarrow$ *unscramble* frequencies.
Section 4.2	(Hope) Unscramble frequencies $\Rightarrow$ decipher ciphertext
Section 4.3	Unscramble = Bring ``close'' to *intrinsic* frequencies
	Approximate *intrinsic* frequencies with *training text*
	Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies
Section 4.4	Use the L¹ *distance* to measure ``closeness''; ignore labels.
Section 5	Q: What are legal and effective ways to rearrange frequencies?
Section 5.1	Sorting unigram frequencies does not work, but ``almost'' does.
	(Hope) When the frequency table for ciphertext matches the table for training text, read the encryption key off of the labels
Section 5.2	``Sort bigram frequencies'' is problematic.
$(\rightarrow)$ Section 5.3	Q: Why not try all decryption keys?

Next: 5.3 Exhaustive Search Up: 5. Attempts at Decryption Previous: 5.1.1 Sorting implies Swapping

Thomas Yan
2000-05-01