Contents
Structure
Base case1: Null is a tree.
Base case2: A single node without any child is a tree.
Step case: A node with one or two children(tree) is a tree.
Constraint: Every node in a tree must be:
- Strictly larger than its left child, and
- Strictly smaller than its right child.
Associated Methods
Get(Search)
Pseudocode
The method can be defined as the recursive definition of the BST.
get(k, node){
if (node == null){
return null;
}
if (node.key == k){
return node;
}
if (node.key > k){
return get(k, node.left);
}
else{
return get(k, node.right);
}
}
Put(Insert)
Main Idea
We recursively(inductively) traverse down to an appropriate leaf node to add the new node. So the program always stops at a leaf node.
Pseudocode
put(newNode, node){
if (node == null){
return newNode;
}
if (node.key > newNode.key){
node.left = put(newNode, node.left);
return node;
}
else{
node.right = put(newNode, node.right);
return node;
}
}
Implementation Explained
When it comes to operations that manipulate the tree nodes rather than simply retrieving them, for example get(), a common strategy is to obey the recursive nature of the tree by defining these methods in an recursive way as well. Not only that, after each recursive layer, we always return the entire sub-tree and connect it to the intended parent. For example, observe the following two lines:
node.left = put(newNode, node.left);
return node;
The point of the returning and the assigning makes sense when we finally found the place to change the tree(in this case, insert newNode), we can simply return newNode to assign to the pointer; But in these two lines, even if we are not finding the right place to insert, we still choose to return the node as plainly as it originally is.
The reason for this seemingly redundant operation gets revealed if you think of the return statement as returning a sub-tree instead of a node, just like how you would construct a tree recursively:
-
Think of the base case first: When we have found the correct place to insert the newNode, we
return newNode
as if we returned a one-element tree; -
Then the step case comes: We take the parent and the inserted node together as a sub-tree to return to the next parent, which then forms another larger sub-tree;
-
This then goes on back until the root node, where the operation is initiated from.
For a straightforward operation like Insert, the benefit of having the recursive return definition is not very obvious, because you can simply use a procedural loop that leads you directly to your destination, i.e. the end leaf. The real meat is hidden at deleteMin() and more importantly at delete().
Floor
Like searching, what floor()
and ceiling()
are capable of is only retrieving data rather than manipulating the entire tree, so these operations exploit only the nature of order in the elements, rather than the recursive definition, so we won’t go into the details of pseudocode.
floor()
is expected to return the largest element strictly smaller than the input key. This means that as we traverse down the tree, we keep track of the last element from which a right turn occurs.
Ceiling
ceil()
is expected to return the smallest element strictly larger than the input key. This means that as we traverse down the tree, we keep track of the last element from which a left turn occurs.
Delete Minimum
Main Idea
Starting from a root node, we repeatedly move to the leftmost node to delete, and delegate its right child as replacement(null if it doesn’t have one).
Pseudocode
delMin(k, node){
if (node.left == null){
return node.right;
}
else{
node.left = delMin(k, node.left);
return node;
}
}
Implementation Explained
Notice how similar the implementation of delMin()
is to that of put()
!
Again, try to think of this in a inductive way:
- Base case: The node with no left node has been found, return the right child as the one-element tree;
- Step case: The parent, together with the returned children (and untouched ones), form a sub-tree to be added to the main tree.
Again notice the fact that we used the term repeatedly in the Main Idea section. This implies that, again, since we have a fixed destination, this method can actually also be implemented in an procedural, looping way, but we still chose the recursive definition to prepare you for the coming final method:
Delete
Main Idea
Deleting a node is a little more complicated than the earlier methods. It’s simple if you want to delete a node which has no children: just return null
; and also not very difficult if you delete a node which has one child: just return the child as the replacement; things become a little trickier when you want to delete a node which has two children: you then need to decide how to find a suitable replacement for the node.
Fortunately, the solution is not that complicated. There is one way to do this, and that exploits the nature of order in the elements: if you look at the right-hand sub-tree, it is evident that all the elements in that sub-tree are strictly larger than the elements in the left-hand sub-tree; hence, we only need to find the minimum in the right-hand sub-tree as the replacement, because it is strictly smaller than all the elements in the right-hand sub-tree (the point of being minimum), and strictly larger than all the elements in the left-hand sub-tree (it is from the right-hand sub-tree), which is all we want for an ideal parent node.
Note: Of course this would work just as well if you choose the maximum from the left-hand sub-tree, for simplicity we’ll use the one described here.
Hence, the most complicated case of this algorithm is divided into three steps:
- Find the minimum in the right-hand sub-tree. For simplicity we call it candidate node.
- Delete the minimum element in the right-hand sub-tree
- Replace all the link attached to the node to be deleted by attaching them to the candidate element.
Next we 'll see how this algorithm is best performed in an recursive way.
Pseudocode
del(k, node){
// Step case - searching the node and build up a sub-tree
if (node == null){ // node is present
return null; // nothing to change so return the same null
}
if (node.key > k){
node.left = del(k, node.left); // obtain the post-del subtree of node
return node; // pack node and its subtrees together as a new subtree to the parent
}
if (node.key < k){
node.right = del(k, node.right);
return node;
}
// Base case - node is found
else{
// Base case 1 - no children
if ((node.left == null) & (node.right == null)){
return null; // node is directly deleted from tree
}
// Base case 2a - no left child
if (node.left == null){
return node.right; // delegate the right child as replacement
}
// Base case 2b - no right child
if (node.right == null){
return node.left;
}
// Base case 3 - two children
else{
temp = node; // keep a copy of all the links(sub-trees) to the node
node = min(node.right); // first of all, this is an reuse of naming space
// secondly, this is to maintain a copy of the candidate node before deleting it
node.right = delMin(node.right); // attach the post-delMin right-hand subtree to the right-hand side of candidate node
node.left = temp.left; // attach the untouched left-hand subtree to the left-hand side of the candidate node
return node; // return the post-del sub-tree to its parent
}
}
Implementation Explained
By now you should already be familiar with the recursive return and assignment. A slightly less intuitive statement is probably assigning delMin(node.right)
to node.right
. To understand this line, first understand that by calling delMin()
on node.right
, this effectively returns a whole sub-tree that is the result of deleting the minimum element from itself. Therefore, this assignment is linking the original right-hand sub-tree excluding the minimum element to the candidate node, establishing one of its two links.
Problems
An uncontrolled binary tree usually result in a ragged tree layout, so the performance of all methods in an average case is only going to be O(clogN)
where c is some undetermined constant. But in the worst case, by inserting elements in an ascending/descending order, the unbalanced binary tree can degrade to a linear structure, thereby degrading all relevant methods to linear performance. Even by constantly switching the deletion operation between “min from the right” and “max from the left” approaches, the average case of deletion is only going to give O(sqrt(N))
performance, which is still an unaffordable performance when N gets large. Hence, we need a mechanism to control the organisation of a binary tree.