4.5 Closeness

Next: 5. Attempts at Decryption Up: 4. Using Frequencies for Previous: 4.4.1 A Note About

4.5 Closeness

To try to calibrate what distances count as close we can compute the distances between the large plaintexts we used to discover intrinsic frequencies.

**Figure 22:** Unigram and Bigram Distances for Large Plaintexts
$\begin{figure} \begin{center} \fbox{\begin{picture} (225,120)(-95,-85) \put(-50,... ...gram =& 19 \\ bigram =& 47\end{tabular}}} \end{picture}}\end{center}\end{figure}$

Figure 22 shows the unigram and bigram distances between the same three corpuses used for Figure 20. Figure 22 suggests that distances of ``around'' 20% or less count as ``close'' for unigram distances and distances of ``around'' 50% or less count as ``close'' for bigram distances, but ``around'' is still too vague. For example, should 70% count as close for bigram distances?⁶ Therefore, we replace the imprecise goal bring ``close'' to intrinsic frequencies by the more specific goal bring as close as possible to intrinsic frequencies.

At this point, we should also perform a sanity check. We should check at least one example to see that frequencies for ciphertext are not ``close'' to intrinsic frequencies: If ciphertext frequencies are also ``close'' to intrinsic frequencies, then ``closeness'' is not a good criterion for recognizing unscrambled frequencies. We use the following as our sample ciphertext:

Enciphered Announcements, the encryption of Announcements using the Caesar Cipher. Enciphered Announcements is available at
http://courses.cs.cornell.edu/cs100/2000sp/cryptannounce.txt

Figure 23 shows that the distances between our large plaintexts and Enciphered Announcements is indeed much larger than the distances between the plaintexts.

**Figure 23:** Unigram and Bigram Distances Between Ciphertext and Large Plaintexts
$\begin{figure} \begin{center} \texttt{\begin{tabular}{@{}l\vert r*{26}{@{ }r}\ve... ...lticolumn{1}{@{}r}{(0)} \\ \cline{2-28} \end{tabular}} \end{center}\end{figure}$

Roadmap
Section 3.7	Encipher plaintext $\Rightarrow$ *scramble* frequencies.
Section 3.7	Decipher ciphertext $\Rightarrow$ *unscramble* frequencies.
Section 4.2	(Hope) Unscramble frequencies $\Rightarrow$ decipher ciphertext
Section 4.3	Unscramble = Bring ``close'' to *intrinsic* frequencies
	Approximate *intrinsic* frequencies with *training text*
	Assume ciphertext is medium to large so that unscrambled frequencies resemble intrinsic frequencies
Section 4.4	Use the L¹ *distance* to measure ``closeness''; ignore labels.
$(\rightarrow)$ Section 5	Q: What are legal and effective ways to rearrange frequencies?

Next: 5. Attempts at Decryption Up: 4. Using Frequencies for Previous: 4.4.1 A Note About

Thomas Yan
2000-05-01