next up previous
Next: 7. Decryption by Optimization Up: 6. Improving Decryption Methods Previous: 6.3 The Effects of

   
6.4 Continuing to Investigate Figure 33, e.g. The Mysterious Right Column

Here are some further observations about Figure 33.

Thus, using bigram frequencies still looks hopeful, i.e. it looks like maybe ciphertext frequencies appear to (usually) be far from intrinsic frequencies.

Let us elaborate a bit on ``read key off of the labels''. Suppose we go back to Stage 2 and copy the middle table into the right column. What happens if, when we perform the swaps shown in the middle column, we modify both the labels and the contents? This is shown in the right column. Observe that again, no matter whether we take the path Stage 0  $\leftarrow$ Stage 1  $\leftarrow$ Stage 2 or Stage 2  $\rightarrow$ Stage 3  $\rightarrow$ Stage 4 we end up with a table that is (close to) the original, i.e. we have unscrambled frequencies. Furthermore, in both paths, we can indeed read off the encryption key!

Thus, we use swaps to generate and search for the best table, i.e. the table closest to the table of intrinsic frequencies (as approximated from training text). When we do a swap, we modify both the labels and the corresponding rows and columns of frequencies. (Conceptually, this means we do not rename letters during the search, merely reorganize the frequency information.) When we get a close match to training text --which we assume is close to plaintext-- we read off the encryption/decryption key from the labels, which allows us to decipher the ciphertext.

It remains now only to clarify how to perform the search.

Roadmap
Section 3.7 Encipher plaintext $\Rightarrow$ scramble frequencies.
Section 3.7 Decipher ciphertext $\Rightarrow$ unscramble frequencies.
Section 4.2 (Hope) Unscramble frequencies $\Rightarrow$ decipher ciphertext
Section 4.3 Unscramble = Bring ``close'' to intrinsic frequencies
  Approximate intrinsic frequencies with training text
  Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies
Section 4.4 Use the L1 distance to measure ``closeness''; ignore labels.
Section 5.1 Sorting unigram frequencies does not work, but ``almost'' does.
  (Hope) When the frequency table for ciphertext matches the table for training text, read the encryption key off of the labels
Section 5.2 ``Sort bigram frequencies'' is problematic.
Section 5.3 Exhaustive search is too slow; therefore, need an incomplete search.
Section 6.1 (Idea) Hill-Climbing: pick the best ``neighbor''
Section 6.2 Why swaps are so important, e.g. used to generate ``neighbors''
Section 6.3 The effect of swaps on bigram tables
$(\rightarrow)$     Section 7.1 Clarifying Hill-Climbing


next up previous
Next: 7. Decryption by Optimization Up: 6. Improving Decryption Methods Previous: 6.3 The Effects of
Thomas Yan
2000-05-01