next up previous
Next: 3.7 Enciphering and Deciphering Up: 3. Foundation for Cracking: Previous: 3.5 Bigram Frequencies

   
3.6 Example of Bigrams

The string "abccc" has bigrams "-a", "ab", "bc", "cc", "cc", and "c-". Where did the "-a" and "c-" come from? We insert spaces at the start and the end of each line of text to mark the start of the first word and end of the last word in each line of text. Figure 14 collects and organizes the tallies of unigrams and bigrams for "abccc".

  
Figure 15: Tallies for Example String "abccc"
\begin{figure}
\begin{center}
\begin{tabular}[t]{\vert c\vert cccc\vert}
\multic...
...lticolumn{2}{r}{\fbox{6}\rlap{ Total}} \\
\end{tabular}\end{center}\end{figure}

Observe that the unigram tallies are equal to the row and column sums of the bigram tallies. Further observe that the total number of unigrams is equal to the total number of bigrams. You might wonder why the number of spaces is 1. This is because we treat the inserted spaces at the front and end of each line of text as half-spaces or shared spaces; this also makes the bigram and unigram tables match.



Thomas Yan
2000-05-01