CS 513 System Security -- Authentication

Authentication

Lecturer: Professor Fred B. Schneider

Lecture notes by Borislav Deianov

In lecture 2 we introduced Lampson's "Gold Standard of Security": authorization, authentication, audit. Authorization often requires ascertaining the identity of the subject making a request. This is called authentication, and it will be the subject of the next few lectures.

Authentication is the process by which the identity of someone or something gets established. We distinguish between two cases -- machines and humans -- because the capabilities of these two classes are quite different. Humans are not good at memorizing things or calculating. (Humans are also cranky, have a narrow operating temperature range, and are expensive to maintain). Machines have large memories and greater computational prowess, but have a different set of weaknesses.

All approaches for authenticating humans are based on three general techniques:

something you know (e.g. a password or other secret);
something you have (e.g. smart card, credit card);
something you are (e.g. fingerprints, retinal scan, voice print).

The last two techniques require special hardware, so they do not enjoy widespread use today. The first technique does enjoy widespread use; it forces the user to convey information over some existing interface (e.g. typing a password on a keyboard).

Passwords are a form of secret. The consequence of equating identity with knowledge of a secret is that if someone learns the secret then they can steal the identity. With this in mind, we see that password-based authentication exhibits the following vulnerabilities:

eavesdropping (somebody might see you typing in your password);
written down passwords might be read by anyone;
poorly chosen passwords are easily guessed;
offline analysis might allow passwords to be computed (if password file is readable).

One way to cut down the risk associated with compromised passwords is for them to have limited lifetimes. If a password is used for a long time, then the chance of compromise is greater (the attacker can obtain more ciphertext) and the incentive for compromise is greater.

There are two kinds of password-guessing attacks -- online and offline. We consider them separately.

In an online attack, the system itself is used to determine whether guesses are right. To protect against online attacks, we can slow down the rate of guessing or allow only a limited number of guesses (example: 3 wrong guesses of a PIN and the ATM eats your card). Allowing only a limited number of guesses before an account is locked-out has the disadvantage that it enables a denial of service attack. Specifically, the attacker locks legitimate users out of the system by trying to guess those users' passwords. We need to find a compromise between two possible evils: Evil 1 is when somebody, who should not be able to, accesses the system; evil 2 is when somebody, who should be able to access the system, cannot.

In an offline attack, some other method is used to verify the correctness of a guess (not the system itself) before that guess is actually used in attacking the system. The form of attack will depend on how passwords are stored.

One simple approach is to store username/password pairs in a file. To penetrate such a system, we only need to read the password file. Therefore the password file must not be readable. But now the security of the entire system depends on the protection of this one file, including protection of back-up tapes that might contain the file, etc. This scheme is unsatisfactory.

A better idea is to store only encrypted passwords. We store a file containing username/encrypted password pairs. On login, the typed password is encrypted with a cryptographic hash function and compared with what is stored password in the file. A cryptographic hash function h has the following properties:

h is not invertible: h(m) is easy to compute but h^-1(m) is hard to compute;
it is hard to find m and m', such that h(m) = h(m').

Such hash functions are very hard to design, there are only a few people in the world who seem capable of getting them right. Thus, using one of the standard ones is strongly recommended: SHA, MD5 (older versions MD4, MD2), etc.

Storing encrypted passwords is still vulnerable to what is called a dictionary attack. Suppose that the file with encrypted passwords is readable. If the attacker knows what hash function was used, then he/she can hash every word in a dictionary and compare the result to the file with encrypted passwords, thus finding all passwords that are words from the dictionary.

To defend against dictionary attacks we can use salt. "Salt" in cryptography is random stuff you add to plaintext before encrypting. Now in the password file we store: username, rrrr, h(password + rrrr). Here rrrr is the salt. And the dictionary attack no longer works, since the attacker will need to hash every word with every possible salt. For example, UNIX uses 12 bits of salt, thus making a dictionary attack 2048 times more difficult than without the salt. However, the PC on Boris's desktop can perform approximately 13,000 crypt()'s per second. This translates to hashing a 20,000 word dictionary with every 12-bit salt combination in under an hour. An offline attack is quite plausible now, since people don't change their passwords every hour. The obvious solutions are to use a more (longer) salt or have the system refuse crackable passwords.

How long should a password be? Today, guessing a 64 bit random bit string is considered intractably hard. How many characters of password is equivalent to 64 bits? If you use upper and lower case letters, and 10 digits you get approximately 6 bits per character. That suggests strings of 13 characters should work. Unfortunately, people can't remember arbitrary 13-character sequences. English words -- which are far from arbitrary sequences -- carry 1.2 bits of information per character, on average. This translates into 49 characters or 10 words of English text to get the desired 64 bits.