Priority queues are a kind of queue in which the elements are dequeued in priority order.
| <% ShowSMLFile("code/imp_prioq.sml") %> |
There are many ways to implement this signature. For example, we could implement it as a linked list where the cells of the list are connected through refs so it can be updated imperatively:
| <% ShowSMLFile("code/list_prioq.sml") %> |
What is the asymptotic performance of this implementation?
insert: O(n),
because it has to bubble a new element in to its rightful place in the
sorted list.extract_min: O(1),
because it can just remove the first element of the list.Another alternative implementation is to use red-black trees or another of the balanced search trees. For example, in red-black trees we can find the minimum element by simply walking down the left children all the way from the root. Extracting the minimum element requires deleting it from the tree; we haven't seen how to do this, but it's about twice as complicated as the insertion we've already seen. This implementation has better performance for many applications:
insert: O(lg n),
because an element must be inserted into the tree according to its priority,
which serves as the key.extract_min: O(lg n),
because red-black deletion also requires walking up and down the tree.In fact, we can tell that this is the best we do in terms of asymptotic
performance, because we can implement sorting using O(n)
priority queue operations, and we know that sorting takes O(n
lg n) time in general. The idea is simply to insert
all the elements to be sorted into the priority queue, and then use extract_min
to pull them out in the right order:
| <% ShowSMLFile("code/heapsort.sml") %> |
Although they have good asymptotic performance, it turns out that red-black trees are overkill for implementing priority queues: they are more complicated and slower than necessary. There is a simple, fast way to implement priority queues.
A heap is a special kind of balanced binary tree. Sometimes it is called a binary heap to distinguish it from a memory heap. The tree satisfies two invariants:
Suppose the priorities are just numbers. Here is a possible heap:
3
/ \
/ \
5 9
/ \ /
12 6 10
Obviously we can find the minimum element in O(1) time. Extracting it while
maintaining the heap invariant will take O(lg n)
time. Inserting a new element and establishing the heap invariant will
also take O(lg n) time. So
asymptotic performance is the same as for red-black trees but constant factors
are better for heaps.
The key observation is that we can represent a heaps as an array.
The root of the tree is at location 0 in the array and the children of the node
stored at position i are at locations 2i+1 and 2i+2. This means that the array
corresponding to the tree contains all the elements of tree, read across row by
row. The representation of the tree above is:
[3 5 9 12 6 10]
Given an element at index i, we can compute where the children are stored,
and conversely we can go from a child at index j to its parent at index floor((j-1)/2).
The rep invariant for heaps in this representation is actually simpler than when
in tree form:
Rep invariant for heap a (the partial
ordering property):
a[i] ≤ a[2i+1] and a[i] ≤ a[2i+2]
for 1 ≤ i ≤ floor((n-1)/2)
Now let's see how to implement the priority queue operations:
Put the element at first missing leaf. (Extend array by one element.)
Switch it with its parent if its parent is larger: "bubble up"
Repeat #2 as necessary.
Example: inserting 4 into previous tree.
3
/ \
/ \
5 9 [3 5 9 12 6 10 4]
/ \ / \
12 6 10 4
3
/ \
/ \
5 4 [3 5 4 12 6 10 9]
/ \ / \
12 6 10 9
This operation requires only O(lg n)
time -- the tree is depth
ceil(lg n) , and we do a bounded amount of work on each level.
extract_min works by returning the element at the root.
The trick is this:
Original heap to delete top element from (leaves two subheaps)
3
/ \
/ \
5 4 [3 5 4 12 6 10 9]
/ \ / \
12 6 10 9
copy last leaf to root
9
/ \
/ \
5 4 [9 5 4 12 6 10]
/ \ /
12 6 10
"push down"
4
/ \
/ \
5 9 [9 5 4 12 6 10]
/ \ /
12 6 10
Again an O(lg n) operation.
We can sort using this implementation of priority queues.
How expensive is the sorting function built from this?
n insertions, at O(lg n) cost, for O(n
lg n) total
n deletions, at O(lg n) cost, for
O(n lg n) total.
Thus, O(n lg n) total cost.
It's called heapsort and it's a standard, reliable sorting algorithm.
If you have to sort by doing comparisons only, this is as fast as possible (up to a constant factor).
There are plenty of other O(n lg n)
algorithms with better properties in some cases, for example:
One last comment -- you might be worried about the fixed size for the array of values. The
solution is just to use a resizable array abstraction (like Java Vectors), which you
should be able to figure out how to
build.