5.3 Exhaustive Search

Our goal can be formulated as follows: Find a decryption key that when applied to ciphertext unscrambles bigram frequencies. This is a search, and one fundamental search algorithm is to try all possibilities:

**Figure 27:** Algorithm to Enumerate All Keys via Swaps
$\fbox{\begin{minipage}{5in}\begin{itemize}\setlength{\itemsep}{0in} \item Start ... ... line. \item Repeat until the list stops changing. \end{itemize}\end{minipage}}$

For instance, suppose you have a three element character set 'a', 'l', 'n'. Figure 28 shows all possible decryption keys for this character set -- only the bottom lines are shown and a $\leftrightarrow$ n means ``swap 'a' with 'n'''.

**Figure 28:** Using the Algorithm in Figure 27 to List of All Keys of `aln`
$\begin{figure} \begin{center} \setlength{\unitlength}{1.2pt} \fbox{\begin{pictu... ...ttt{aln}}} \put(170,30){\line(-1,-1){20}} \end{picture}}\end{center}\end{figure}$

Suppose you need to decrypt a ciphertext that uses this three-letter character set and a '-' for spaces. Apply all candidate decryption keys from Figure 28 to the ciphertext.⁸ Each time you apply a different key to the ciphertext, you generate something you hope is plaintext. Collect the frequencies from the hoped-for-plaintext. Choose as the decryption key and the plaintext the key and text with frequencies closest to intrinsic frequencies.⁹ Figure 29 shows this exhaustive search, and note that indeed the minimum distance 15 corresponds to the decrypted ciphertext.

**Figure 29:** Distances for Each Candidate Decryption Key: Minimum 15 is Marked with ``*''
$\begin{figure} \begin{center} \fbox{\emph{\begin{tabular}{\vert@{ }c@{ }\vert} \... ...cline{1-4} \cline{6-9} \cline{11-14} \end{tabular}}\par\end{center}\end{figure}$

For ``small'' alphabets, we can reasonably perform an exhaustive search. However, what if the language has a relatively ``large'' character set? Consider English, which has 26 letters, not counting punctuation. There are $26!\mathrel{\dot=}4\times10^{26}$ decryption keys for a 26-letter alphabet! Machines these days cannot perform more than about a billion (10⁹) operations per second, so even if they could look at a billion ciphers per second, it would still take $4\times10^{17}$ seconds, or over 10¹⁰ years! So, you will have to find a better approach to crack EGOP's message since you have until May 4, 2000!

Therefore, we must use an incomplete search that looks at only a (small) subset of the keys. We answer the questions ``Which keys?'' and ``How do you pick them?'' in the next section.

Roadmap
Section 3.7	Encipher plaintext $\Rightarrow$ *scramble* frequencies.
Section 3.7	Decipher ciphertext $\Rightarrow$ *unscramble* frequencies.
Section 4.2	(Hope) Unscramble frequencies $\Rightarrow$ decipher ciphertext
Section 4.3	Unscramble = Bring ``close'' to *intrinsic* frequencies
	Approximate *intrinsic* frequencies with *training text*
	Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies
Section 4.4	Use the L¹ *distance* to measure ``closeness''; ignore labels.
Section 5.1	Sorting unigram frequencies does not work, but ``almost'' does.
	(Hope) When the frequency table for ciphertext matches the table for training text, read the encryption key off of the labels
Section 5.2	``Sort bigram frequencies'' is problematic.
Section 5.3	*Exhaustive* search is too slow; therefore, need an incomplete search.
$(\rightarrow)$ Section 6.2	Q: How do we perform an effective, incomplete search?