Binary Search Trees
A binary search tree (BST) is a binary tree with the following representation invariant:
For any node n, every node in the left subtree of n has a value less than n's value, and every node in the right subtree of n has a value greater than n's value.
We call that the BST invariant.
Here is code that implements a couple operations on a BST:
type 'a tree = Node of 'a * 'a tree * 'a tree | Leaf
(** [mem x t] is [true] iff [x] is a member of [t]. *)
let rec mem x = function
| Leaf -> false
| Node (y, l, r) ->
if x < y then mem x l
else if x > y then mem x r
else true
(** [insert x t] is [t] . *)
let rec insert x = function
| Leaf -> Node (x, Leaf, Leaf)
| Node (y, l, r) as t ->
if x < y then Node (y, insert x l, r)
else if x > y then Node (y, l, insert x r)
else t
What is the running time of those operations? Since insert
is just a
mem
with an extra constant-time node creation, we focus on the
mem
operation.
The running time of mem
is , where
is the height of the tree, because every recursive call descends
one level in the tree. What's the worst-case height of a
tree? It occurs with a tree of nodes all in a single long
branch—imagine adding the numbers 1,2,3,4,5,6,7 in order into
the tree. So the worst-case running time of mem
is still
, where is the number of nodes in the tree.
What is a good shape for a tree that would allow for fast lookup? A perfect binary tree has the largest number of nodes for a given height , which is . Therefore , which is .
If a tree with nodes is kept balanced, its height is , which leads to a lookup operation running in time .
How can we keep a tree balanced? It can become unbalanced during element insertion or deletion. Most balanced tree schemes involve adding or deleting an element just like in a normal binary search tree, followed by some kind of tree surgery to rebalance the tree. Some examples of balanced binary search tree data structures include
- AVL trees (1962)
- 2-3 trees (1970's)
- Red-black trees (1970's)
Each of these ensures running time by enforcing a stronger invariant on the data structure than just the binary search tree invariant.