next up previous
Next: 3.3 Example Up: 3. Foundation for Cracking: Previous: 3.1 Natural Languages

3.2 Frequencies

How might you crack a cryptosystem that uses character mappings? Consider how frequently certain characters appear in text. If you could spot repeated patterns in encoded text and then match them to known patterns in ``regular'' text, you might be able to crack the cryptosystem! For example, the letter 'u' almost always follows the letter 'q'.2 Natural languages have other patterns, too. A frequency is a measure of how often a pattern appears in a body of text. You may measure frequency of a pattern as either:

Recall that $x\% = \displaystyle\frac{x}{100}$ and observe that the fraction is always between $0\%=0$ and $100\%=1.0$.

There are published tables of frequencies of single letters and pairs of letters for different languages.3 We refer to these frequencies as unigram and bigram frequencies:

Note that the order of letters matters, e.g. "qu" and "uq" are different! Higher-order frequencies, like trigrams, are also studied and would help our task, but for simplicity, do not consider them.


next up previous
Next: 3.3 Example Up: 3. Foundation for Cracking: Previous: 3.1 Natural Languages
Thomas Yan
2000-05-01