Sorting unigram frequencies almost worked, and we have reason to believe using bigram frequencies would work better. Therefore, a natural first reaction is to consider the following:
One intuition as to why this approach is problematic is that with sorting unigram frequencies, swapping two frequencies corresponds to swapping two letters. However, with sorting bigram frequencies, swapping two frequencies does not correspond to swapping two letters: Swapping two letters affects a whole set of associated frequencies.
In Section 6.3 we will carefully investigate how swapping two letters affects the table of bigram frequencies. However, for now, we will suppress the details, except to point out that that there are details that must be considered, and rule out ``sort bigram frequencies'' as a simple approach.
Roadmap | |
Section 3.7 | Encipher plaintext
![]() |
Section 3.7 | Decipher ciphertext
![]() |
Section 4.2 | (Hope)
Unscramble frequencies
![]() |
Section 4.3 | Unscramble = Bring ``close'' to intrinsic frequencies |
Approximate intrinsic frequencies with training text | |
Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies | |
Section 4.4 | Use the L1 distance to measure ``closeness''; ignore labels. |
Section 5 | Q: What are legal and effective ways to rearrange frequencies? |
Section 5.1 | Sorting unigram frequencies does not work, but ``almost'' does. |
(Hope) When the frequency table for ciphertext matches the table for training text, read the encryption key off of the labels | |
Section 5.2 | ``Sort bigram frequencies'' is problematic. |
![]() |
Q: Why not try all decryption keys? |