Contents
Heap
Heap can occur in many forms. If it can be treated as a linear queue, where each element is associated with a priority value. The enqueuing of an element always inserts the element based on its priority value, whereas the dequeuing of the heap always pops the element with the lowest (or highest) priority value.
However the potential of a heap is only maximised if it is treated non-linearly, as in a binary structure, known as the binary heap.
Binary Heap
Normal Binary Tree
If we want to store the priority value in a binary tree, it may look like this:
A binary tree shows both the literal order of the priorities, as well as the order in which the elements (in the same branch) are inserted. For instance in this binary tree, it is quite clear that B is added to the tree after A, as it is a direct child of A.
Binary Heap
In a binary heap, however, the key(priority) values is organised in a different manner (for simplicity we assume a max-oriented binary heap, where the element with maximum priority value is the head. For now we don’t care about how to maintain such structure, we only observe the result):
- All layers, except the bottom-most layer, of the binary heap, are filled;
- The parent node are be greater than its child node(-s);
- New elements are appended from left to right in the bottom-most layer.
Therefore, the values stored in the binary tree above, if now stored in a binary heap, should look like this:
(Note that as we only care about the priority values, the question of which element is inserted first actually does not matter to us. So maintaining a structure omitting that feature actually makes more sense in this case.)
Because all the parent layers are filled, and the bottom-most layer is filled from left to right only, we can actually flatten the binary heap into a linear array, as below:
We only need to keep the structure in mind as we access or insert elements, without an explicit “linking” or pointer, because simple integer arithmetic enables the access of the nodes in an array:
- Parent of node
i
is at positioni / 2
; - Children of node
i
is at postions2i
and2i+1
.
Enqueue()
The enqueuing of an element requires inserting it into the correct position while maintaining the heap structure. However, the process is not complicated:
- Append the new element at position
N+1
(end of the array); - Swim the element up the heap by constantly comparing the node(
i
) with its parent(i/2
) - if the parent is greater then heap structure is maintained, otherwise swap the node and the parent and repeat the compare & swap.
It is quite intuitive that this operation requires at most log(N) + 1
comparisons (the worst case occurs when adding an element to a heap where the bottom-most layer is filled).
Dequeue()
The dequeuing of the heap is straightforward as well - the node to be dequeued is naturally the root node.
The issue that comes after the pop is a bit complicated: We need to maintain the structure of the heap, not only in terms of finding a proper candidate for the parent roles, but also in terms of making sure that all non-bottom layers are filled. For the second reason, the original process of removing a parent node for an ordinary tree (iteratively finding the larger child until no more children) is no longer applicable in this case.
Fortunately, the solution is not complicated as well:
- Swap the root node and the last element in the array (
0
andN
); - Sink the new root node down in the heap by constantly comparing the node
i
with the larger child2i
or2i+1
- if the child is smaller or there is no child then heap structure is maintained, otherwise swap the child and the node and repeat the process.
There are log(N)
layers in the heap, and since at each iteration, there is a comparison between the children, followed by another comparison between the larger child and the parent, which makes a total number of 2log(N)
comparisons in the worst case (when the node sinks down to the bottom layer again).
Heap Sort
Main Idea
The heuristics behind the heap sort is very simple: first we take the non-heap array and transform it into a heap, and constantly dequeue the root element to construct a sorted array. It can be written as the following sequence of steps:
- (Transform into heap) For each node who has a child, starting from the last one
N/2
, sink it down the array. - (Decompose the heap) Dequeue the root element, but instead of having a new array, we can directly take the space saved from the “shrunk” heap as the space for building our sorted array.
- Repeat step 2 until the heap is empty (or one element left).
Analysis
Complexity
For heap transformation, each sink operation takes O(log(h'))
time, where h’ is the height of the sub-heap. This adds up to give an overall upper bound of O(N)
running time.
For heap decomposition, the number of dequeue()
is proportional to the size of the heapN
, so the upper bound is given as O(Nlog(N))
.
Overall, the heap sort algorithm is bounded by O(Nlog(N))
. The linearithmic performance even in the worst case scenario, which is comparable to merge sort.
Benefits
- In-place;
- Theoretically fastest even in the worst case.
Drawbacks
- Not stable… due to the large amount of direct swapping;
- Practically one of the slowest sorting algorithms, because the array access is in logarithmic steps (all the 2i and i/2), which is not an efficient use of computer cache, especially when the array size is million/billion-level, memory page swapping is needed as frequently as the comparisons, which accumulates to an unaffordable amount.