Main Idea
- Randomly shuffle the input array once
- Choose a pivot from the array and insert it into the “correct” position - where all elements to the left are smaller and all elements to the right are greater. This partitions the array into two sub-arrays.
- Repeat the process on the two sub-arrays.
The way to find the position for the pivot is quite unique, and ensures O(N) operations:
- In each iteration, we set two pointers
i
andj
, wherei
travels from the head onto the tail in the array andj
travels from the tail onto the head. - Each time we check if
a[i]
is smaller than our pivot, and if not, that meansa[i]
needs to be in the latter half of the array instead of the former half, so we halt the increment ofi
; - At the same time in the same loop, we decrement
j
to see whethera[j]
is larger than our pivot, if so then carry on decrementing; otherwise thisa[j]
is in the wrong half of the array; - As we 've been decrementing
j
after we finish incrementingi
, it is possible that they have already passed each other, in which case no swapping is needed as they have trespassed each other’s territory; - If
i
andj
are still safe, this means that we have found a pair of values that are in the wrong position of the list, so we need to swap them. - carry on the loop.
This is a smart way of dividing the workload as it uses two pointers in opposite directions.
There are recursive & non-recursive implementations for this algorithm.
- To implement a recursive algorithm, you simply need to recursively call the partitioning function on the individual partitions, until the partition contains one element only or is simply empty. Both in-place and not-in-place solutions are possible, the difference is only that an in-place algorithm returns the head and tail index of the partitions, while a not-in-place returns a new copy of the partition, which only needs to be concatenated onto other partitions.
- To implement a non-recursive version, you need to have a stack that holds the partitions. The initial state of the stack should contain the original input array, and loop iterates by constantly popping the top-most element (array) out of the stack, and (if the popped array contains only one number or is empty then skip, otherwise) feed it to the partition function, while the result of the function - two smaller partitions are pushed into the stack. This process iterates until the stack becomes empty.
Analysis
Complexity
In the average scenario, since we’re dividing an array into two in every iteration, ideally we have log(N) layers of recursion in total (where we can always divide in the middle or so), and in each layer, we overall visit each element exactly once (in the partition that contains it and in that partition only). Hence in each layer we make exactly N visits. Therefore the overall complexity is O(Nlog(N)).
From the discussion above, you can also see the situation which makes the algorithm degrading rapidly - that is exactly when the input array is already sorted. In this case, we are unable to divide the array into two parts of equal size. In fact, since the input array is already sorted, that means every element is already in its correct position, so each partition is only going to give us a partition that contains the element (or empty depending on implementation), and a partition of the sorted list that is exactly one element short. In this case, the algorithm will make exactly N partition calls in total, in order to completely exhaust the array. This produces an N-layer recursive call, and still in each layer we make N visits, thus the overall complexity becomes O(N * N) = O(N^2).
Benefits
- In-place… No need for a “copy” array, so spatially efficient;
- Fastest in most cases… Mostly temporally efficient as well.
Drawbacks
- Not stable… Non-consecutive swaps are made instantly, so original order not maintained
- Not robust to duplicate keys, as pointers halt immediately when the strict inequality is not satisfied;
- Degrades rapidly in some rarest circumstances
Possible fix
- None, as it is how Quick Sort works!
- MUST: during partitioning, stop scan on items equal to pivot
SHOULD: during partitioning, pull all items equal to pivot in-place (Licia Capra, COMP0005, UCL, 2022) - That’s the reason for the random shuffle at the beginning of the algorithm - provides a statistical insurance (but still…).