Binary Search Trees

A binary search tree (BST) is a binary tree with the following representation invariant:

For any node n, every node in the left subtree of n has a value less than n's value, and every node in the right subtree of n has a value greater than n's value.

We call that the BST invariant.

Here is code that implements a couple operations on a BST:

type 'a tree = Node of 'a * 'a tree * 'a tree | Leaf

(** [mem x t] is [true] iff [x] is a member of [t]. *)
let rec mem x = function
  | Leaf -> false
  | Node (y, l, r) ->
    if x < y then mem x l
    else if x > y then mem x r
    else true

(** [insert x t] is [t] . *)    
let rec insert x = function
  | Leaf -> Node (x, Leaf, Leaf) 
  | Node (y, l, r) as t -> 
    if x < y then Node (y, insert x l, r)
    else if x > y then Node (y, l, insert x r)
    else t

What is the running time of those operations? Since insert is just a mem with an extra constant-time node creation, we focus on the mem operation.

The running time of mem is O(h)O(h), where hh is the height of the tree, because every recursive call descends one level in the tree. What's the worst-case height of a tree? It occurs with a tree of nn nodes all in a single long branch—imagine adding the numbers 1,2,3,4,5,6,7 in order into the tree. So the worst-case running time of mem is still O(n)O(n), where nn is the number of nodes in the tree.

What is a good shape for a tree that would allow for fast lookup? A perfect binary tree has the largest number of nodes nn for a given height hh, which is n=2h+11n = 2^{h+1} - 1. Therefore h=log(n+1)1h = \log(n+1) - 1, which is O(logn)O(\log n).

If a tree with nn nodes is kept balanced, its height is O(logn)O(\log n), which leads to a lookup operation running in time O(logn)O(\log n).

How can we keep a tree balanced? It can become unbalanced during element insertion or deletion. Most balanced tree schemes involve adding or deleting an element just like in a normal binary search tree, followed by some kind of tree surgery to rebalance the tree. Some examples of balanced binary search tree data structures include

  • AVL trees (1962)
  • 2-3 trees (1970's)
  • Red-black trees (1970's)

Each of these ensures O(logn)O(\log n) running time by enforcing a stronger invariant on the data structure than just the binary search tree invariant.

results matching ""

    No results matching ""