Here are some further observations about Figure 33.
Remember, the frequencies at each Stage are those of the ciphertext obtained by applying the encryption key to the plaintext. Note that there is nothing in that description about the path taken, i.e. which swaps were used and what order they were performed. Thus, the frequencies should depend on the key but not the path taken to reach the key. We see that at least in Figure 33 that this is indeed the case.
Consider the ciphertext at Stage 2:
Let us elaborate a bit on ``read key off of the labels''. Suppose we
go back to Stage 2 and copy the middle table into the right column.
What happens if, when we perform the swaps shown in the middle column,
we modify both the labels and the contents? This is shown in
the right column. Observe that again, no matter whether we
take the path Stage 0
Stage 1
Stage 2 or
Stage 2
Stage 3
Stage 4 we end up with a
table that is (close to) the original, i.e. we have unscrambled
frequencies. Furthermore, in both paths, we can indeed read off the
encryption key!
Thus, we use swaps to generate and search for the best table, i.e. the table closest to the table of intrinsic frequencies (as approximated from training text). When we do a swap, we modify both the labels and the corresponding rows and columns of frequencies. (Conceptually, this means we do not rename letters during the search, merely reorganize the frequency information.) When we get a close match to training text --which we assume is close to plaintext-- we read off the encryption/decryption key from the labels, which allows us to decipher the ciphertext.
It remains now only to clarify how to perform the search.
Roadmap | |
Section 3.7 | Encipher plaintext
![]() |
Section 3.7 | Decipher ciphertext
![]() |
Section 4.2 | (Hope)
Unscramble frequencies
![]() |
Section 4.3 | Unscramble = Bring ``close'' to intrinsic frequencies |
Approximate intrinsic frequencies with training text | |
Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies | |
Section 4.4 | Use the L1 distance to measure ``closeness''; ignore labels. |
Section 5.1 | Sorting unigram frequencies does not work, but ``almost'' does. |
(Hope) When the frequency table for ciphertext matches the table for training text, read the encryption key off of the labels | |
Section 5.2 | ``Sort bigram frequencies'' is problematic. |
Section 5.3 | Exhaustive search is too slow; therefore, need an incomplete search. |
Section 6.1 | (Idea) Hill-Climbing: pick the best ``neighbor'' |
Section 6.2 | Why swaps are so important, e.g. used to generate ``neighbors'' |
Section 6.3 | The effect of swaps on bigram tables |
![]() |
Clarifying Hill-Climbing |