next up previous
Next: 6. Improving Decryption Methods Up: 5. Attempts at Decryption Previous: 5.2 Sort Bigram Frequencies?

   
5.3 Exhaustive Search

Our goal can be formulated as follows: Find a decryption key that when applied to ciphertext unscrambles bigram frequencies. This is a search, and one fundamental search algorithm is to try all possibilities:

We can enumerate all keys by progressively listing more and more keys. This algorithm is shown in Figure 27.
  
Figure 27: Algorithm to Enumerate All Keys via Swaps
\fbox{\begin{minipage}{5in}\begin{itemize}\setlength{\itemsep}{0in}
\item Start ...
... line.
\item Repeat until the list stops changing.
\end{itemize}\end{minipage}}

(If you wish to see a justification for this algorithm to enumerate all keys, see Section 6.2.)

For instance, suppose you have a three element character set 'a', 'l', 'n'. Figure 28 shows all possible decryption keys for this character set -- only the bottom lines are shown and a $\leftrightarrow$n means ``swap 'a' with 'n'''.

  
Figure 28: Using the Algorithm in Figure 27 to List of All Keys of aln
\begin{figure}
\begin{center}
\setlength{\unitlength}{1.2pt}
\fbox{\begin{pictu...
...ttt{aln}}} \put(170,30){\line(-1,-1){20}}
\end{picture}}\end{center}\end{figure}

Suppose you need to decrypt a ciphertext that uses this three-letter character set and a '-' for spaces. Apply all candidate decryption keys from Figure 28 to the ciphertext.8 Each time you apply a different key to the ciphertext, you generate something you hope is plaintext. Collect the frequencies from the hoped-for-plaintext. Choose as the decryption key and the plaintext the key and text with frequencies closest to intrinsic frequencies.9 Figure 29 shows this exhaustive search, and note that indeed the minimum distance 15 corresponds to the decrypted ciphertext.


  
Figure 29: Distances for Each Candidate Decryption Key: Minimum 15 is Marked with ``*''
\begin{figure}
\begin{center}
\fbox{\emph{\begin{tabular}{\vert@{ }c@{ }\vert} \...
...cline{1-4} \cline{6-9} \cline{11-14}
\end{tabular}}\par\end{center}\end{figure}

For ``small'' alphabets, we can reasonably perform an exhaustive search. However, what if the language has a relatively ``large'' character set? Consider English, which has 26 letters, not counting punctuation. There are $26!\mathrel{\dot=}4\times10^{26}$ decryption keys for a 26-letter alphabet! Machines these days cannot perform more than about a billion (109) operations per second, so even if they could look at a billion ciphers per second, it would still take $4\times10^{17}$ seconds, or over 1010 years! So, you will have to find a better approach to crack EGOP's message since you have until May 4, 2000!

Therefore, we must use an incomplete search that looks at only a (small) subset of the keys. We answer the questions ``Which keys?'' and ``How do you pick them?'' in the next section.

Roadmap
Section 3.7 Encipher plaintext $\Rightarrow$ scramble frequencies.
Section 3.7 Decipher ciphertext $\Rightarrow$ unscramble frequencies.
Section 4.2 (Hope) Unscramble frequencies $\Rightarrow$ decipher ciphertext
Section 4.3 Unscramble = Bring ``close'' to intrinsic frequencies
  Approximate intrinsic frequencies with training text
  Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies
Section 4.4 Use the L1 distance to measure ``closeness''; ignore labels.
Section 5.1 Sorting unigram frequencies does not work, but ``almost'' does.
  (Hope) When the frequency table for ciphertext matches the table for training text, read the encryption key off of the labels
Section 5.2 ``Sort bigram frequencies'' is problematic.
Section 5.3 Exhaustive search is too slow; therefore, need an incomplete search.
$(\rightarrow)$     Section 6.2 Q: How do we perform an effective, incomplete search?


next up previous
Next: 6. Improving Decryption Methods Up: 5. Attempts at Decryption Previous: 5.2 Sort Bigram Frequencies?
Thomas Yan
2000-05-01