next up previous
Next: 5. Attempts at Decryption Up: 4. Using Frequencies for Previous: 4.4.1 A Note About

   
4.5 Closeness

To try to calibrate what distances count as close we can compute the distances between the large plaintexts we used to discover intrinsic frequencies.

  
Figure 22: Unigram and Bigram Distances for Large Plaintexts
\begin{figure}
\begin{center}
\fbox{\begin{picture}
(225,120)(-95,-85)
\put(-50,...
...gram =& 19 \\ bigram =& 47\end{tabular}}}
\end{picture}}\end{center}\end{figure}

Figure 22 shows the unigram and bigram distances between the same three corpuses used for Figure 20. Figure 22 suggests that distances of ``around'' 20% or less count as ``close'' for unigram distances and distances of ``around'' 50% or less count as ``close'' for bigram distances, but ``around'' is still too vague. For example, should 70% count as close for bigram distances?6 Therefore, we replace the imprecise goal bring ``close'' to intrinsic frequencies by the more specific goal bring as close as possible to intrinsic frequencies.

At this point, we should also perform a sanity check. We should check at least one example to see that frequencies for ciphertext are not ``close'' to intrinsic frequencies: If ciphertext frequencies are also ``close'' to intrinsic frequencies, then ``closeness'' is not a good criterion for recognizing unscrambled frequencies. We use the following as our sample ciphertext:

Figure 23 shows that the distances between our large plaintexts and Enciphered Announcements is indeed much larger than the distances between the plaintexts.
  
Figure 23: Unigram and Bigram Distances Between Ciphertext and Large Plaintexts
\begin{figure}
\begin{center}
\texttt{\begin{tabular}{@{}l\vert r*{26}{@{ }r}\ve...
...lticolumn{1}{@{}r}{(0)} \\ \cline{2-28}
\end{tabular}}
\end{center}\end{figure}

Roadmap
Section 3.7 Encipher plaintext $\Rightarrow$ scramble frequencies.
Section 3.7 Decipher ciphertext $\Rightarrow$ unscramble frequencies.
Section 4.2 (Hope) Unscramble frequencies $\Rightarrow$ decipher ciphertext
Section 4.3 Unscramble = Bring ``close'' to intrinsic frequencies
  Approximate intrinsic frequencies with training text
  Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies
Section 4.4 Use the L1 distance to measure ``closeness''; ignore labels.
$(\rightarrow)$     Section 5 Q: What are legal and effective ways to rearrange frequencies?


next up previous
Next: 5. Attempts at Decryption Up: 4. Using Frequencies for Previous: 4.4.1 A Note About
Thomas Yan
2000-05-01