快速排序详解

快速排序,又称分区交换排序 (partition-exchange ),简称快排

算法
快速排序使用分治法(Divide and conquer)策略来把一个序列(list)分为较小和较大的2个子序列,然后递归地排序两个子序列。
步骤为:

  • 挑选基准值:从数列中挑出一个元素,称为“基准”(pivot)。
  • 分割:重新排序数列,所有比基准值小的元素摆放在基准前面,所有比基准值大的元素摆在基准后面(与基准值相等的数可以到任何一边)。在这个分割结束之后,对基准值的排序就已经完成。
  • 递归排序子序列:递归地将小于基准值元素的子序列和大于基准值元素的子序列排序。

递归到最底部的判断条件是数列的大小是零或一,此时该数列显然已经有序。
选取基准值有数种具体方法,此选取方法对排序的时间性能有决定性影响。

Lomuto partition scheme
这种方案选择数组的最后一个元素作为基准。此算法使用两个索引i和j,用i来遍历数组,使得lo到i-1(包含i-1)的元素小于基准,i到j(包含j)的元素大于等于基准。这种方案紧凑且容易理解,所以经常使用,尽管性能没有Hoare’s original scheme好。伪代码如下:

algorithm quicksort(A, lo, hi) is
    if lo < hi then
        p := partition(A, lo, hi)
        quicksort(A, lo, p - 1)
        quicksort(A, p + 1, hi)

algorithm partition(A, lo, hi) is
    pivot := A[hi]
    i := lo
    for j := lo to hi do
        if A[j] < pivot then
            swap A[i] with A[j]
            i := i + 1
    swap A[i] with A[hi]
    return i

Java实现:

public void quickSort(int[] arr, int lo, int hi) {
    if (lo < hi) {
        int p = partition(arr, lo, hi);
        quickSort(arr, lo, p - 1);
        quickSort(arr, p + 1, hi);
    }
}

private int partition(int[] arr, int lo, int hi) {
    int pivot = arr[hi];
    int i = lo;
 
    for (int j = lo; j <= hi; j++) {
        if (arr[j] < pivot) {
            swap(arr, i, j);
            i++;
        }
    }
    swap(arr, i, hi);
    return i;
}

private void swap(int[] arr, int i, int j) {
    int swapTemp = arr[i];
    arr[i] = arr[j];
    arr[j] = swapTemp;
}

Hoare partition scheme
此方法使用两个指针分别指向要被分区的数组的两端,然后向对方移动,直到发现了一个inversion(颠倒),即存在一组元素,一个大于等于基准,一个小于基准,两个元素的彼此顺序颠倒,然后把这两个元素互换。当两个指针相遇,算法停止并返回最终索引。伪代码如下:

algorithm quicksort(A, lo, hi) is
    if lo < hi then
        p := partition(A, lo, hi)
        quicksort(A, lo, p)
        quicksort(A, p + 1, hi)

algorithm partition(A, lo, hi) is
    pivot := A[(hi + lo) / 2]
    i := lo - 1
    j := hi + 1
    loop forever
        do
            i := i + 1
        while A[i] < pivot
        do
            j := j - 1
        while A[j] > pivot
        if i ≥ j then
            return j
        swap A[i] with A[j]

此算法保证了 l o ≤ p < h i lo\leq p < hi lop<hi,即得到的两个分区都是非空的,所以不存在无限递归的风险。
Java实现:

private void quickSort(int[] arr, int lo, int hi) {
    if (lo < hi) {
        int p = partition(arr, lo, hi);
        quickSort(arr, lo, p);
        quickSort(arr, p + 1, hi);
    }
}

private int partition(int[] arr, int lo, int hi) {
    int pivot = arr[(lo + hi) / 2];
    int i = lo - 1;
    int j = hi + 1;
    while(true) {
        do i++; while(arr[i] < pivot);
        do j--; while(arr[j] > pivot);
        
        if (i >= j) 
            return j;
        swap(arr, i, j);
    }
}

private void swap(int[] arr, int i, int j) {
    int swapTemp = arr[i];
    arr[i] = arr[j];
    arr[j] = swapTemp;
}

实现过程中的注意事项
1. 基准的选择
在早期快排版本中,分区的最左元素经常被用来当作基准,不幸的是,当数组已经排好序时会导致最坏的情况。随机选择,选择中间元素,或者选择第一个元素、中间元素和最后一个元素的中位数(median-of-three)就可以解决这个问题。
median-of-three伪代码:

mid := (lo + hi) / 2
if A[mid] < A[lo]
    swap A[lo] with A[mid]
if A[hi] < A[lo]
    swap A[lo] with A[hi]
if A[mid] < A[hi]
    swap A[mid] with A[hi]
pivot := A[hi]

整数溢出问题:如果数组的边界索引足够大的话,(lo + hi)/2会导致溢出使得返回结果无效。使用lo + (hi−lo)/2可以解决这个问题。

2. 重复元素
With a partitioning algorithm such as the Lomuto partition scheme described above (even one that chooses good pivot values), quicksort exhibits poor performance for inputs that contain many repeated elements. The problem is clearly apparent when all the input elements are equal: at each recursion, the left partition is empty (no input values are less than the pivot), and the right partition has only decreased by one element (the pivot is removed). Consequently, the Lomuto partition scheme takes quadratic time to sort an array of equal values. However, with a partitioning algorithm such as the Hoare partition scheme, repeated elements generally results in better partitioning, and although needless swaps of elements equal to the pivot may occur, the running time generally decreases as the number of repeated elements increases (with memory cache reducing the swap overhead). In the case where all elements are equal, Hoare partition scheme needlessly swaps elements, but the partitioning itself is best case, as noted in the Hoare partition section above.

To solve the Lomuto partition scheme problem (sometimes called the Dutch national flag problem), an alternative linear-time partition routine can be used that separates the values into three groups: values less than the pivot, values equal to the pivot, and values greater than the pivot. (Bentley and McIlroy call this a “fat partition” and it was already implemented in the qsort of Version 7 Unix.) The values equal to the pivot are already sorted, so only the less-than and greater-than partitions need to be recursively sorted. In pseudocode, the quicksort algorithm becomes

algorithm quicksort(A, lo, hi) is
    if lo < hi then
        p := pivot(A, lo, hi)
        left, right := partition(A, p, lo, hi)  // note: multiple return values
        quicksort(A, lo, left - 1)
        quicksort(A, right + 1, hi)

The partition algorithm returns indices to the first (‘leftmost’) and to the last (‘rightmost’) item of the middle partition. Every item of the partition is equal to p and is therefore sorted. Consequently, the items of the partition need not be included in the recursive calls to quicksort.

The best case for the algorithm now occurs when all elements are equal (or are chosen from a small set of k ≪ n elements). In the case of all equal elements, the modified quicksort will perform only two recursive calls on empty subarrays and thus finish in linear time (assuming the partition subroutine takes no longer than linear time).

3. Optimizations
Two other important optimizations, also suggested by Sedgewick and widely used in practice, are:

  • To make sure at most O(log n) space is used, recur first into the smaller side of the partition, then use a tail call to recur into the other, or update the parameters to no longer include the now sorted smaller side, and iterate to sort the larger side.
  • When the number of elements is below some threshold (perhaps ten elements), switch to a non-recursive sorting algorithm such as insertion sort that performs fewer swaps, comparisons or other operations on such small arrays. The ideal ‘threshold’ will vary based on the details of the specific implementation.
  • An older variant of the previous optimization: when the number of elements is less than the threshold k, simply stop; then after the whole array has been processed, perform insertion sort on it. Stopping the recursion early leaves the array k-sorted, meaning that each element is at most k positions away from its final sorted position. In this case, insertion sort takes O(kn) time to finish the sort, which is linear if k is a constant. Compared to the “many small sorts” optimization, this version may execute fewer instructions, but it makes suboptimal use of the cache memories in modern computers.

4. Parallelization
Quicksort’s divide-and-conquer formulation makes it amenable to parallelization using task parallelism. The partitioning step is accomplished through the use of a parallel prefix sum algorithm to compute an index for each array element in its section of the partitioned array. Given an array of size n, the partitioning step performs O(n) work in O(log n) time and requires O(n) additional scratch space. After the array has been partitioned, the two partitions can be sorted recursively in parallel. Assuming an ideal choice of pivots, parallel quicksort sorts an array of size n in O(n log n) work in O(log² n) time using O(n) additional space.

Quicksort has some disadvantages when compared to alternative sorting algorithms, like merge sort, which complicate its efficient parallelization. The depth of quicksort’s divide-and-conquer tree directly impacts the algorithm’s scalability, and this depth is highly dependent on the algorithm’s choice of pivot. Additionally, it is difficult to parallelize the partitioning step efficiently in-place. The use of scratch space simplifies the partitioning step, but increases the algorithm’s memory footprint and constant overheads.

Other more sophisticated parallel sorting algorithms can achieve even better time bounds. For example, in 1991 David Powers described a parallelized quicksort (and a related radix sort) that can operate in O(log n) time on a CRCW (concurrent read and concurrent write) PRAM (parallel random-access machine) with n processors by performing partitioning implicitly.

参考
https://en.wikipedia.org/wiki/Quicksort#Optimizations

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值