next up previous
Next: 3.6 Example of Bigrams Up: 3. Foundation for Cracking: Previous: 3.4 Unigram Frequencies

   
3.5 Bigram Frequencies

A bigram frequency measures how often a pair of letters occurs. For instance, take the ratio of the number of times 'c' comes before 'd' (1 time) with the total number of pairs (64 times). You will find that the pair "cd" appears 2% (1/64) of the time in the text shown in Figure 10. To collect all bigram frequencies, use a 2-D table called a bigram table, as shown in Figures 13 and 14.


  
Figure 13: Bigram Frequencies (Tallies)
\begin{figure}
\begin{center}
\begin{tabular}{\vert r\vert*{5}{r}\vert}
\multic...
...
\texttt{a} & 6 & 6 & 4 & 1 & 3 \\ \hline
\end{tabular}\end{center}\end{figure}


  
Figure 14: Bigram Frequencies (Percentages (%))
\begin{figure}
\begin{center}
\begin{tabular}{\vert r\vert*{5}{r}\vert}
\multico...
...& 2\\
\texttt{a} & 9& 9& 6& 2& 5\\ \hline
\end{tabular}\end{center}\end{figure}

Using the bigram table, how do you store and access particular frequencies? Let the notation $\mbox{\emph{freq}}_{\mbox{\scriptsize$i$ },\mbox{\scriptsize$j$ }}$ indicate a frequency stored in the bigram table located at row i and column j: In Figures 13 and 14, the i labels are to the left of the j labels, just like how they appear in words. So, you may express the pair i, j as, ``character i before character j''. More formally, determine the frequencies of pairs of characters with the following formula:

\begin{displaymath}\mbox{\emph{freq}}_{\mbox{\scriptsize$i$ },\mbox{\scriptsize$...
...he pair $i$ ,$j$\space appears}}{\mbox{total number of pairs}}.\end{displaymath}

For instance, $\mbox{\emph{freq}}_{\mbox{\scriptsize\texttt{$'$ c$'$ }},\mbox{\scriptsize\texttt{$'$ d$'$ }}}$ refers to the frequency 2% located at row 'c' and column 'd' in Figure 14. Other examples include $\mbox{\emph{freq}}_{\mbox{\scriptsize\texttt{$'$ a$'$ }},\mbox{\scriptsize\texttt{$'$ -$'$ }}}=9\%$, $\mbox{\emph{freq}}_{\mbox{\scriptsize\texttt{$'$ -$'$ }},\mbox{\scriptsize\texttt{$'$ a$'$ }}}=13\%$, and $\mbox{\emph{freq}}_{\mbox{\scriptsize\texttt{$'$ a$'$ }},\mbox{\scriptsize\texttt{$'$ b$'$ }}}=6\%$. Some observations you should note:


next up previous
Next: 3.6 Example of Bigrams Up: 3. Foundation for Cracking: Previous: 3.4 Unigram Frequencies
Thomas Yan
2000-05-01