Next: 3.6 Example of Bigrams
Up: 3. Foundation for Cracking:
Previous: 3.4 Unigram Frequencies
3.5 Bigram Frequencies
A bigram frequency measures how often a pair of letters occurs. For instance,
take the ratio of the number of times 'c' comes before 'd' (1 time) with the
total number of pairs (64 times). You will find that the pair "cd" appears 2%
(1/64) of the time in the text shown in Figure 10. To collect all bigram
frequencies, use a 2-D table called a bigram table, as shown in
Figures 13 and 14.
Figure 13:
Bigram Frequencies (Tallies)
 |
Figure 14:
Bigram Frequencies (Percentages (%))
 |
Using the bigram table, how do you store and access particular frequencies? Let
the notation
indicate a frequency stored in the bigram table located at
row i and column j:
- Row i indicates the first letter in a pair.
- Column j indicates the second letter in a pair.
In Figures 13 and 14,
the i labels are to the left of the j labels, just like how they
appear in words. So, you may express the pair i, j as,
``character i before character j''.
More formally, determine the frequencies of pairs of characters with the
following formula:
For instance,
refers to the frequency 2% located at row 'c' and column 'd' in
Figure 14. Other examples include
,
,
and
.
Some observations you should
note:
-
doesn't necessarily equal
because the pairs might occur a different number of times.
-
is 0 because the example has no double-spaces.
- By convention, we require the labels on the top be in the same order as the labels to the left.
- The bigram tables in Figures 13 and 14 are equivalent.
- Accounting for roundoff-error,
adding up frequencies in each row or column in the bigram table yields the
frequencies in the unigram table.
For example, in Figure 13,
the sum of column 'b' produces
2+4+1+1+0=8
and the sum of row 'b' produces
3+3+1+1+0=8,
which matches the unigram tally of 'b' in Figure 11.
Similarly, in Figure 14,
the sum of column 'b' produces
and the sum of row 'b' produces
,
which matches the unigram percentage of 'b' in Figure 12.
Next: 3.6 Example of Bigrams
Up: 3. Foundation for Cracking:
Previous: 3.4 Unigram Frequencies
Thomas Yan
2000-05-01