This problem set has 3 parts. In parts 1 you'll implement a map structure using balanced binary trees. In Part 2 you'll learn about the RSA cryptosystem and implement algorithms in this system. Part 3 is a written problem. As before, you will write your solutions in the appropriate files and submit the files separately on CMS. To get you started, we are providing a zip file containing several files. You can edit the necessary files, and submit your solution using CMS when you have finished. Non-compiles and warnings will receive an automatic zero.
It is recommended (but not required) that you work with one partner for this assignment. Please use CMS to pair up with your partner.
For this assignment you are required to use CVS (or any other versioning system) for the concurrent code development within your team. A quick CVS tutorial can be found on the main page of the course web site.
[Download the zip file ps4.zip]
Recall that an AVL tree is a binary search tree where all nodes have the property that the height difference between their children (of balance factor b) is -1, 0, or 1. Operations that insert and remove elements from such trees must perform re-balancing operations to ensure that each node is properly balanced. It is possible to generalize the notion of balancing by requiring that balance factors (height difference between children) be bounded by a constant k: -k <= balance factor <= k. The parameter k controls how balanced such trees are. We say that a tree is k-AVL if it is a binary search and each node in the tree has a balance factor between -k and k.
For k = 1 we obtain the AVL trees discussed in class. For larger values of k, trees become taller and less balanced; insertions and deletions take longer (still O(log n) if k is a constant with respect to n), but will require fewer rebalancing operations. If k is arbitrarily large, rotations will never be necessary, and the trees will behave as standard binary search trees.
In this part of the assignment you will implement a map abstraction
using k-AVL, i.e., AVL trees with k-bounded balance factors. The AVL tree set is parameterized on a
signature ORD_BIND containing the types of keys and values, a
function that compares two keys, and the constant k representing the upper bound
for balance factors. Each node of the tree contains an element and the height of the current node.
The datatype definition of tree sets is as follows:
datatype bind = key * value datatype avl = Nil | Node of bind * bal * avl * avl type map = avl
Your task is the following:
repOk function that checks
whether a tree is a valid k-AVL tree map. For full score, your implementation
must be O(n) in the number of nodes of the tree.bal
for k-AVL trees. The argument of the function is a tree whose balance factor
is -k-1 or k+1, and all subtrees are k-AVL. It returns a k-AVL tree that
contains the same bindings as the given tree.bal.map.sig, that is functions get,
fold, and size. The fold function will
traverse the nodes of the tree using in-order traversal, and function
size returns the number of keys in the map.To do: turn in the solution in the file avlmap.sml.
In a world that relies increasingly upon digital information, public key
encryption and digital signatures play an important part in achieving private
communication. RSA is a popular public-key encryption system. It is based on the
fact that there are fast algorithms for exponentiation and for testing prime
number—but no known fast algorithms for factoring large numbers. In this
problem set you will implement a version of the RSA system. The algorithms will
you implement have both a deep mathematical foundation and practical importance.
Cryptographic systems typically use keys for encryption and decryption. An
encryption key is used to convert the original message (the plaintext) to coded
form (the ciphertext). A corresponding decryption key is used to convert the
ciphertext back to the original plaintext. In traditional cryptographic systems,
the same key is used for both encryption and decryption. Two parties can
exchange coded messages only if they share a secret key. Since anyone who
learned that key would be able to decode the messages, keys must be carefully
guarded and transmitted only under tight security: for example, couriers
handcuffed to locked, tamper-resistant briefcases!
Diffie and Hellman (1976) discovered a new approach to encryption and
decryption: public key cryptography. In this approach, the encryption and
decryption keys are different. Knowing the encryption key cannot help you find
the decryption key. Thus you can publish your encryption key on the web, and
anyone who wants to send you a secret message can use it to encode a message to
send to you. You do not have to worry about key security at all, for even if
everyone in the world knew your encryption key, no one could decrypt messages
sent to you without knowing your decryption key, which you keep private to
yourself. The RSA system, due to Rivest, Shamir, and Adelman, is the best-known
public-key cryptosystem; the security of your web browsing probably depends on
it.
RSA does not work with characters, but with integers. That is, it encrypts numbers. To encrypt a piece of text, just combine the ASCII codes of several characters into a big number, and then encrypt the resulting number. A big number is an integer of arbitrary magnitude (i.e., is not restricted to 32 bits or 64 bits).
Suppose you want to obtain public and private keys in the RSA system. You select two very large prime numbers p and q. (Recall that a prime number is a positive integer greater than 1 with no divisors other than itself and 1). You then compute numbers n and m:
n = pq
m = (p−1)(q−1)
It turns out that no one knows how to compute m, p, or q efficiently, knowing n.
(Notation: [a=b] mod n means
that (a mod n) = (b mod n),
where a mod n is the remainder obtained
when dividing a by n
using ordinary integer division. Equivalently, [a=b]
mod n if a−b is divisible
by n. For example, [17 = 32]
mod 5.)
Now you pick a number e < m relatively
prime to m; that is, such that
e and m have
no factors in common except 1. The significance of
relative primality is that e is relatively
prime to m iff e
is invertible mod m; that is, iff there exists
a d such that [de =
1] mod m. Moreover, it is possible to compute
d from e and
m using Euclid's algorithm, described below. Then:
Anyone who wants to send you a secret message s
(represented by an integer) encrypts it by computing a number
E(s). The cyphertext E(s)
is obtained by raising s to the power
e, then taking the remainder modulo
n :
E(s) = se mod n
The decryption process is exactly the same, except that d
is used instead of e:
D(s) = sd mod n
The operations E and D
are inverses:
D(E(s)) = (se)d
mod n
= sde mod n
= s1+km mod n
= s(sm)k mod n
= s(1)k mod n
= s mod n
= s
The integer s representing the plaintext must
be less than n. That's why we break the message
up into chunks. In the above reasoning, the equality [sm=1]
mod n holds only if s is
relatively prime to n (using a theorem due to
Euler and Fermat). Otherwise, if s divides
n, that is, s
is either p or q,
then D(E(s)) = sde mod n =
s, trivially. Therefore, it is always the case that
D(E(s)) = s.
In summary, to use the RSA system:
1. pick large primes p and
q;
2. compute n = pq and
m = (p−1)(q−1);
3. choose e relatively prime to
m and use this to compute d
such that [de = 1] mod m;
4. publish the pair (n,e) as your
public key, but keep d, p,
and q secret.
How secure is RSA? At present, the only known way to obtain
d from e and
n is to factor n
into its prime factors p and
q, then compute m
and proceed as above. But despite centuries of effort by number theorists,
factoring large integers efficiently is still an open problem. Until someone
comes up with an efficient way to factor numbers, or discovers some other way to
compute d from e
and n, the cryptosystem appears to be
secure for large numbers n. Although several
"better" factorization algorithms have been proposed, the problem remains
computationally intractable for numbers that are large enough. For example, it
is infeasible with today's best supercomputers and best algorithms known to date
to factor an arbitrary 1024-bit number. In fact, the
RSA Factoring
Challenge offers money prizes for factoring large numbers. To date, the
largest number from this challenge that has been successfully factored had 640
bits (or 193 decimal digits). It took about 5 months for several tens of
machines working in parallel to factor this number.
In the file rsa.sml you have been given supporting code for
implementing the RSA algorithm. The code relies on the implementation of big
integers and random numbers from the
SML/NJ library.
Note that this library is not loaded by default! Using the SML Compilation
Manager, the library is loaded by the line smlnj-lib.cm in the
sources.cm file (check the comments in that file if you have
trouble loading the library).
The given code provides a variety of functions, including functions to generate prime numbers, and to test numbers for primality; code that generates key pairs; to read and write keys to and from files; or to convert between lists of bytes and big numbers. We briefly discuss a few of these functions.
expmod(b, e, m)
computes be mod m.
Function euclid(m,n) yields
numbers (s,t,g) such that sm
+ tn = g, where g is the
greatest common divisor of m and
n. isPrime(n) tests is number is prime. This is a
probabilistic test: it yields true if there is a high probability that
n is prime; and yields false if
n is definitely not prime. This function is
based on Fermat's theorem, which states that if n
is prime, then all numbers a in the range 0 < a
< n satisfy [an-1
= 1] mod n; and if n
is not prime, then at most half do. Therefore, successfully checking if [an-1
= 1] mod n for k numbers
a indicates that n is prime with a probability of error of 1 /
2k.generateKeyPair(r) generates an RSA key pair.
It picks random primes p and
q that are in the range from 2r
to 2*2r so that n =
pq will be in the range 22r to 22r+2.
To find the keys e and
d given m = (p-1)(q-1),
function selectKeys picks a random e,
and computes d using Euclid's algorithm.You must add the encryption and decryption functions specified below. Your
functions encode an input list of bytes into an output list of bytes. Therefore,
these functions operate on blocks of bytes. Bytes are represented as
values of type Word8.word (check the Word8 structure
for operations on values of this type). Each block consists of M
bytes or (M-1) bytes, where M is the number of bytes required to store the
modulus n of the key (n,e)
or (n,d).
(* encrypt k l encrypts byte list l with key k. * If M is the number of bytes in k's modulus, then break l into * (M-1)-byte blocks, and encrypt each block into an M-byte output * block. Pad last input block to form an (M-1)-byte block. * Returns: a list n'::l', where n' is the amount of padding in the * last block, and l' is the concatenation of all encrypted blocks. *) val encrypt : key -> word list -> word list
(* decrypt k l decrypts byte list l with key k. * Requires that l = n'::l', where n' is the amount of padding that * must be removed from the last block, and l' consists of encrypted * M-byte blocks. As above, M is the number of bytes of k's modulus. * The function decrypts each M-byte block into an (M-1)-byte block. * Returns: the decrypted list. *) val decrypt : key -> word list -> word list
Note: You may find it useful to use a single function that performs the RSA-transform, and use this function for both encryption and decryption:
(* transform k n l encodes a list of bytes l into an * n-byte output list l', using key k. * Pads the result with leading zeroes if necessary. *) val transform : key -> int -> word list -> word list
To do: Complete encrypt and decrypt in
rsa.sml.
Public key encryption can be used to solve another problem of secure communication. Suppose you wish to send a message by electronic mail and sign it so that the recipient can be sure that it really came from you. What is required is some scheme for signing a message in a way that cannot be forged. This is called a digital signature.
Suppose you want to sign a digital message M in a way that convinces anyone else that the message came from you. This can be achieved by providing a signature that could only have been sent by someone with your private key. This signature is simply D(M). Anyone who receives M and D(M) can verify that the signature corresponding to the message by applying E to it, obtaining E(D(M)) = M. Only someone in possession of the private key d could have constructed a signature with this property. If someone wanted to forge a signature to a bogus message M, they would have to compute D(M), but assuming that RSA is secure, this cannot be done without the private key d.
We can combine the two techniques as follows. Suppose Amy wants to send Bob a message that only Bob can read, and sign the message so that Bob knows it could only be from her. She encrypts the message using Bob's public key. Then she signs the encrypted result using her own private key using the signature method just described. When Bob receives the message, he first uses Amy's public key to authenticate the signature, then decrypts the message using his own private key. Bob can be sure that only someone with Amy's private key could have sent the message. Amy can be sure that only someone with Bob's private key can read the message. This is accomplished without exchanging any secret information between Bob and Amy. It's this capacity for achieving secure communication without having to worry about exchanging secret keys that makes public key cryptography such an appealing technique.
Note: For efficiency and other reasons, real implementations of digital signatures do not sign the message itself, but just sign a small hashcode of the message. That is, for a message M, the digital signature is not D(M), but D(H(M)), where H is a secure one-way hash function (an example hash function is MD5). The pair (M, D(H(M))) is the signed document.
To Do: Implement the functions encrypt_and_sign and
unsign_and_decrypt in rsa.sml.
datatype BST = Empty | Node of BST * int * BST fun height(Empty) = -1 | height(Node(l,_,r) = 1 + Int.max(height l, height r) fun size(Empty) = 0 | size(Node(l,_,r) = 1 + (size l) + (size r)
Show that size(t) <= 2 height(t) + 1 - 1
equals from Part 1.
To do: turn in the solution in the file written.txt.
cvslog showing your cvs
activity during the assignment. For this, look into the "cvs log" command.
This will count for 5% of the grade for this assignment.