Example: Standard Library Map
The standard library's Map module, which implements a dictionary data structure using balanced binary trees, is based on functors. In this section, we study how to use it. You can see the implementation of that module on GitHub as well as its interface.
The Map module defines a functor Make
that creates a structure implementing
a map over a particular type of keys. That type is the input structure
to Make
. The type of that input structure is Map.OrderedType
, which are types that
support a compare
operation:
module type OrderedType = sig
type t
val compare : t -> t -> int
end
The Map module needs ordering because balanced binary trees need to be able
to compare keys to determine whether one is greater than another.
According to the library's documentation, compare
must satisfy this
specification:
(* This is a two-argument function [f] such that
* [f e1 e2] is zero if the keys [e1] and [e2] are equal,
* [f e1 e2] is strictly negative if [e1] is smaller than [e2],
* and [f e1 e2] is strictly positive if [e1] is greater than [e2].
* Example: a suitable ordering function is the generic structural
* comparison function [Stdlib.compare]. *)
val compare : t -> t -> int
Arguably this specification is a missed opportunity for good design: the library designers could instead have defined a variant:
type order = LT | EQ | GT
and required the output type of compare
to be order
. But historically many
languages have used comparison functions with similar specifications, such
as the C standard library's strcmp
function.
The output of Map.Make
is a structure whose type is (almost) Map.S
and supports
all the usual operations we would expect from a dictionary:
module type S =
sig
type key
type 'a t
val empty: 'a t
val mem: key -> 'a t -> bool
val add: key -> 'a -> 'a t -> 'a t
val find: key -> 'a t -> 'a
...
end
There are two reasons why we say that the output is "almost" that type:
The Map module actually specifies a sharing constraint (which we covered in the previous notes):
type key = Ord.t
. That is, the output ofMap.Make
shares itskey
type with the typeOrd.t
. That enables keys to be compared withOrd.compare
. The way that sharing constraint is specified is in the type ofMake
(which can be found inmap.mli
, the interface file for the map compilation unit):module Make : functor (Ord : OrderedType) -> (S with type key = Ord.t)
The Map module actually specifies something called a variance on the representation type, writing
+'a t
instead of'a t
as we did above. We won't concern ourselves with what this means; it's related to subtyping and polymorphic variants.
The functor Map.Make
itself (which can be found in map.ml
, the implementation
file for the map compilation unit) is currently defined as follows, though of course
the library is free to change its internals in the future:
module Make(Ord: OrderedType) = struct
type key = Ord.t
type 'a t =
| Empty
| Node of 'a t * key * 'a * 'a t * int
(* left subtree * key * value * right subtree * height of node *)
let empty = Empty
let rec mem x = function
| Empty -> false
| Node(l, v, _, r, _) ->
let c = Ord.compare x v in
c = 0 || mem x (if c < 0 then l else r)
...
The key
type is defined to be a synonym for the type t
inside Ord
, so
key
values are comparable using Ord.compare
. The mem
function uses
that to compare keys and decide whether to recurse on the left subtree or right
subtree.
Using the Map Module
A map for integer keys.
To create a map, we have to pass a structure into Map.Make
, and that structure
has to define a type t
and compare
function. The simplest way to do
that is to pass an anonymous structure into the functor:
# module IntMap = Map.Make(struct type t = int let compare = Stdlib.compare end);;
module IntMap :
sig
type key = int
type 'a t
val empty : 'a t
...
end
# open IntMap;;
# let m1 = add 1 "one" empty;;
val m1 : string t = <abstr>
# find 1 m1;;
- : string = "one"
# mem 42 m1;;
- : bool = false
# find 42 m1;;
Exception: Not_found.
# bindings m1;;
- : (int * string) list = [(1, "one")]
# let m2 = add 1 1. empty;;
val m2 : float t = <abstr>
# bindings m2;;
- : (int * float) list = [(1, 1.)]
Here are some things to note about the utop transcript above:
We can write a structure on one line, even though until now we've always used line breaks to keep them readable. When writing a structure on on line (which we'll only do for really short structures) it can be useful to use the double semicolon between definitions to enhance readability:
# module IntMap = Map.Make(struct type t = int;; let compare = Stdlib.compare end);;
This is an exception to the general style rule of avoiding double semicolon inside source code.
If we didn't want to pass an anonymous structure, we could instead define a module and pass it:
module Int = struct type t = int let compare = Stdlib.compare end module IntMap = Map.Make(Int)
The signature of the structure returned by
Map.Make
records the fact that keys are of typeint
. The type'a t
is the name of the representation type of anIntMap
. The'a
type variable in it is the type of values in the map. Although in general the map could have any value type, once we add a single value to a map, that "pins down" the value type of that particular map. When we add the binding from key1
to string"one
above, notice that the map value returned is of typestring t
.The
bindings
function of a map returns an association list of all the bindings in the map. Association lists are, of course, another data structure that implements a dictionary. But they are less efficient than the balanced binary search tree implementation used byMap
.The
mem
function tests whether a key is a member of a map. Thefind
function finds the value associated with a key, and raises theNot_found
exception if the key is not bound in the map. That's the same exception thatList.assoc
raises if a key is not bound in an association list.
A map for string keys.
If a module already provides a type t
that can be compared, we can
immediately use that module as an argument to Map.Make
. Several
standard library modules are designed to be used in that way.
For example, the String
module defines a type t
and a compare
function
that meet the specification of Map.OrderedType
. So we can easily
create maps whose key type is string
:
# module StringMap = Map.Make(String);;
module StringMap :
sig
type key = string
...
end
Now we could use the string map like we used the int map. This time, for sake
of example, let's not open the StringMap
module:
# let m = StringMap.(add "one" 1 empty);;
# let m' = StringMap.(add "two" 2 m);;
# StringMap.bindings m';;
- : (string * int) list = [("one", 1); ("two", 2)]
# StringMap.bindings m;;
- : (string * int) list = [("one", 1)]
#
Note that maps are a functional data structure: adding a mapping to m
did not mutate m
; rather, it produced a new map that we bound
to m'
, and both the new map and old map remain available for use.
A map for record keys.
When the type of a key becomes more complicated than a built-in primitive
type, we might want to write our own custom comparison function. For
example, suppose we want a map in which keys are records representing names,
and in which names are sorted alphabetically by last name then by first name.
In the code below, we provide a module Name
that can compare records that way:
type name = {first:string; last:string}
module Name = struct
type t = name
let compare {first=first1;last=last1}
{first=first2;last=last2} =
match Stdlib.compare last1 last2 with
| 0 -> Stdlib.compare first1 first2
| c -> c
end
The Name
module can be used as input to Map.Make
because it matches
the Map.OrderedType
signature:
module NameMap = Map.Make(Name)
And now we could add some names to a map. Below, for sake of example, we map some names to birth years, and we use the pipeline operator to easily add multiple bindings one after another:
let k1 = {last="Kardashian"; first="Kourtney"}
let k2 = {last="Kardashian"; first="Kimberly"}
let k3 = {last="Kardashian"; first="Khloe"}
let k4 = {last="West"; first="Kanye"}
let nm = NameMap.(
empty |> add k1 1979 |> add k2 1980
|> add k3 1984 |> add k4 1977)
let lst = NameMap.bindings nm
The value of lst
will be
[({first = "Khloe"; last = "Kardashian"}, 1984);
({first = "Kimberly"; last = "Kardashian"}, 1980);
({first = "Kourtney"; last = "Kardashian"}, 1979);
({first = "Kanye"; last = "West"}, 1977)]
Note how the order of keys in that list is not the same as the order in which
we added them. The list is sorted according to the Name.compare
function we wrote.
Several of the other functions in the Map.S
signature will also process map bindings
in that sorted order—for example, map
, fold
, and iter
.
Code Reuse with Map
Stepping back from the mechanics of how to use Map
, let's think about
how it achieves code reuse. The implementor of Map
had a tricky problem
to solve: balanced binary search trees require a way to compare keys, but
the implementor can't know in advance all the different types of keys
that a client of the data structure will want to use. And each type of
key might need its own comparison function. Although the standard library's
Stdlib.compare
can be used to compare any two values of the same
type, the result it returns isn't necessarily what a client will want. For
example, it's not guaranteed to sort names in the way we wanted above.
So the implementor of Map
parameterized it on a structure that bundles
together the type of keys with a function that can be used to compare them.
It's the client's responsibility to implement that structure. Given it,
all the code in Map
can be re-used by the client.
The Java Collections Framework solves a similar problem in the TreeMap class, which has a constructor that takes a Comparator. There, the client has the responsibility of implementing a class for comparisons, rather than a structure. Though the language features are different, the idea is the same.