DualPivotQuicksort双基准快排


前言

DualPivotQuicksort 源码分析


一、经典快排

传统的快速排序是取一个轴,将数组划分成两部分,将小于该轴的放在左边,大于的放在右边,之后递归两边的数组。

二、双轴快排的优点?

双轴故名思意,用两个轴将原来的两个分区变为三个,相比之前的快排能多确定一个区间,并且引入多个轴,减小了最坏情况的概率(最大或最小)。

三、开始之前

在读懂源码之前先好好理解下面这个图(copy自jdk1.8源码之中的,下面也有)
/*
 * Partitioning:
 *
 *   left part           center part                   right part
 * +--------------------------------------------------------------+
 * |  < pivot1  |  pivot1 <= && <= pivot2  |    ?    |  > pivot2  |
 * +--------------------------------------------------------------+
 *               ^                          ^       ^
 *               |                          |       |
 *              less                        k     great
 *
 * Invariants:
 *
 *              all in (left, less)   < pivot1
 *    pivot1 <= all in [less, k)     <= pivot2
 *              all in (great, right) > pivot2
 *
 * Pointer k is the first index of ?-part.
 */

四、参数

	/**
     * Prevents instantiation.
     */
    private DualPivotQuicksort() {}

    /*
     * Tuning parameters.
     */

    /**
     * 归并排序的最大有序分组.
     */
    private static final int MAX_RUN_COUNT = 67;

    /**
     * 相同元素多就用快排取代归并.
     */
    private static final int MAX_RUN_LENGTH = 33;

    /**
     * 如果需要排序的Array长度小于该常量,则使用快速排序
     */
    private static final int QUICKSORT_THRESHOLD = 286;

    /**
     * 如果需要排序的Array长度小于该值,优先使用插入排序
     */
    private static final int INSERTION_SORT_THRESHOLD = 47;

    /**
     * If the length of a byte array to be sorted is greater than this
     * constant, counting sort is used in preference to insertion sort.
     */
    private static final int COUNTING_SORT_THRESHOLD_FOR_BYTE = 29;

    /**
     * If the length of a short or char array to be sorted is greater
     * than this constant, counting sort is used in preference to Quicksort.
     */
    private static final int COUNTING_SORT_THRESHOLD_FOR_SHORT_OR_CHAR = 3200;

五、归并排序

JDK代码如下:

/**
     * Sorts the specified range of the array using the given
     * workspace array slice if possible for merging
     *
     * @param a the array to be sorted
     * @param left the index of the first element, inclusive, to be sorted
     * @param right the index of the last element, inclusive, to be sorted
     * @param work a workspace array (slice)
     * @param workBase origin of usable space in work array
     * @param workLen usable size of work array
     */
    static void sort(int[] a, int left, int right,
                     int[] work, int workBase, int workLen) {
        // 长度小于286的优先使用快速排序
        if (right - left < QUICKSORT_THRESHOLD) {
            sort(a, left, right, true);
            return;
        }

        /*
         * Index run[i] is the start of i-th run
         * (ascending or descending sequence).
         */
        int[] run = new int[MAX_RUN_COUNT + 1];
        int count = 0; run[0] = left;

        // 检查数组附近都是近似有序的,决定到底使用快排还是归并
        for (int k = left; k < right; run[count] = k) {
            if (a[k] < a[k + 1]) { // 升序
                while (++k <= right && a[k - 1] <= a[k]);
            } else if (a[k] > a[k + 1]) { // 降序
                while (++k <= right && a[k - 1] >= a[k]);
                //反转
                for (int lo = run[count] - 1, hi = k; ++lo < --hi; ) {
                    int t = a[lo]; a[lo] = a[hi]; a[hi] = t;
                }
            } else { // equal
                for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {
                    if (--m == 0) {
                        sort(a, left, right, true);
                        return;
                    }
                }
            }

            /*
             * 数组不是高度有序的使用快速排序
             */
            if (++count == MAX_RUN_COUNT) {
                sort(a, left, right, true);
                return;
            }
        }

        // 检查特殊情况
        // 实现说明:变量“right”增加1。
        if (run[count] == right++) { // 包含一个对象
            run[++count] = right;
        } else if (count == 1) { // 已经是有序的
            return;
        }

		//归并排序 没啥好说的了
        // Determine alternation base for merge
        byte odd = 0;
        for (int n = 1; (n <<= 1) < count; odd ^= 1);

        // Use or create temporary array b for merging
        int[] b;                 // temp array; alternates with a
        int ao, bo;              // array offsets from 'left'
        int blen = right - left; // space needed for b
        if (work == null || workLen < blen || workBase + blen > work.length) {
            work = new int[blen];
            workBase = 0;
        }
        if (odd == 0) {
            System.arraycopy(a, left, work, workBase, blen);
            b = a;
            bo = 0;
            a = work;
            ao = workBase - left;
        } else {
            b = work;
            ao = 0;
            bo = workBase - left;
        }

        // Merging
        for (int last; count > 1; count = last) {
            for (int k = (last = 0) + 2; k <= count; k += 2) {
                int hi = run[k], mi = run[k - 1];
                for (int i = run[k - 2], p = i, q = mi; i < hi; ++i) {
                    if (q >= hi || p < mi && a[p + ao] <= a[q + ao]) {
                        b[i + bo] = a[p++ + ao];
                    } else {
                        b[i + bo] = a[q++ + ao];
                    }
                }
                run[++last] = hi;
            }
            if ((count & 1) != 0) {
                for (int i = right, lo = run[count - 1]; --i >= lo;
                    b[i + bo] = a[i + ao]
                );
                run[++last] = right;
            }
            int[] t = a; a = b; b = t;
            int o = ao; ao = bo; bo = o;
        }
    }

六、双轴快排

JDK代码如下:

/**
     * Sorts the specified range of the array by Dual-Pivot Quicksort.
     *
     * @param a the array to be sorted
     * @param left the index of the first element, inclusive, to be sorted
     * @param right the index of the last element, inclusive, to be sorted
     * @param leftmost indicates if this part is the leftmost in the range 是否最左
     */
    private static void sort(int[] a, int left, int right, boolean leftmost) {
        int length = right - left + 1;

        // 长度小于47的,使用插入排序
        if (length < INSERTION_SORT_THRESHOLD) {
        	//是否最左,对于最左的理解:下面的快排最左边的值都小于别的分区
        	//这样做可以省去左侧边界检查if (j-- == left),经过第一遍排序之后,左边一定小于右边。
            if (leftmost) {
                /*
                 * 传统插入排序
                 */
                for (int i = left, j = i; i < right; j = ++i) {
                    int ai = a[i + 1];
                    while (ai < a[j]) {
                        a[j + 1] = a[j];
                        if (j-- == left) {
                            break;
                        }
                    }
                    a[j + 1] = ai;
                }
            } else {
                /*
                 * 跳过有序的
                 */
                do {
                    if (left >= right) {
                        return;
                    }
                } while (a[++left] >= a[left - 1]);

                /*
                 * 双插入排序,一次比较两个,省去了部分重复的比较
                 * 比较上面的传统插入省去了左侧检查(leftmost的区块边界值当哨兵)
                 */
                for (int k = left; ++left <= right; k = ++left) {
                    int a1 = a[k], a2 = a[left];
                    if (a1 < a2) {
                        a2 = a1; a1 = a[left];//比较两个数的大小
                    }
                    while (a1 < a[--k]) {
                        a[k + 2] = a[k];//先对大的数进行插入排序
                    }
                    a[++k + 1] = a1;

                    while (a2 < a[--k]) {
                        a[k + 1] = a[k];//a2省去了前面a1比较的
                    }
                    a[k + 1] = a2;
                }
                int last = a[right];
				//处理多出来的,奇偶问题
                while (last < a[--right]) {
                    a[right + 1] = a[right];
                }
                a[right + 1] = last;
            }
            return;
        }

        // 获取length的七分之一近似值
        int seventh = (length >> 3) + (length >> 6) + 1;

        /*
         * 获取5个轴,e1、e2、e3、e4、e5分别近似位于数组待排序部分的3/14, 5/14, 7/14, 9/14, 11/14,他们的选定是根据大量数据积累经验确定的
         */
        int e3 = (left + right) >>> 1; // The midpoint
        int e2 = e3 - seventh;
        int e1 = e2 - seventh;
        int e4 = e3 + seventh;
        int e5 = e4 + seventh;

        // 手动实现了插入排序,将这5个轴排序
        if (a[e2] < a[e1]) { int t = a[e2]; a[e2] = a[e1]; a[e1] = t; }

        if (a[e3] < a[e2]) { int t = a[e3]; a[e3] = a[e2]; a[e2] = t;
            if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
        }
        if (a[e4] < a[e3]) { int t = a[e4]; a[e4] = a[e3]; a[e3] = t;
            if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
                if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
            }
        }
        if (a[e5] < a[e4]) { int t = a[e5]; a[e5] = a[e4]; a[e4] = t;
            if (t < a[e3]) { a[e4] = a[e3]; a[e3] = t;
                if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
                    if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
                }
            }
        }

        // Pointers
        int less  = left;  // 参考下图less的初始值
        int great = right; // 参考下图great的初始值

		//五个值都不相等时,才会用双轴快排,否则用单轴
        if (a[e1] != a[e2] && a[e2] != a[e3] && a[e3] != a[e4] && a[e4] != a[e5]) {
            /*
             * e1-e5,取第二和第四个作为轴
             * 是数组的三分位的近似值 pivot1 <= pivot2
             */
            int pivot1 = a[e2];
            int pivot2 = a[e4];

            /*
             *将第一个和最后一个需要排序的,挪动到轴所占据的位置(轴是基准,不需要排序)
             */
            a[e2] = a[left];
            a[e4] = a[right];

            /*
             * 跳过有序的,即 小于pivot1和大于pivot2的,这里做了++i而不是i++,把left和right空出来,给pivot1和pivot2腾位置。
             */
            while (a[++less] < pivot1);
            while (a[--great] > pivot2);

            /*
             * Partitioning:
             *
             *   left part           center part                   right part
             * +--------------------------------------------------------------+
             * |  < pivot1  |  pivot1 <= && <= pivot2  |    ?    |  > pivot2  |
             * +--------------------------------------------------------------+
             *               ^                          ^       ^
             *               |                          |       |
             *              less                        k     great
             *
             * Invariants:
             *
             *              all in (left, less)   < pivot1
             *    pivot1 <= all in [less, k)     <= pivot2
             *              all in (great, right) > pivot2
             *
             * Pointer k is the first index of ?-part.
             */
            outer:
            for (int k = less - 1; ++k <= great; ) {
                int ak = a[k];
                if (ak < pivot1) { // Move a[k] to left part
                    a[k] = a[less];
                    /*
                     * 
                     * 这里用 "a[i] = b; i++;" 而不是
                     *  "a[i++] = b;" 是性能问题,也就是这么写运行快一点.
                     */
                    a[less] = ak;
                    ++less;
                } else if (ak > pivot2) { // Move a[k] to right part
                    while (a[great] > pivot2) {
                        if (great-- == k) {
                            break outer;
                        }
                    }
                    if (a[great] < pivot1) { // a[great] <= pivot2
                        a[k] = a[less];
                        a[less] = a[great];
                        ++less;
                    } else { // pivot1 <= a[great] <= pivot2
                        a[k] = a[great];
                    }
                    /*
                     *  "a[i] = b; i--;" 比 "a[i--] = b;" 性能好
                     */
                    a[great] = ak;
                    --great;
                }
            }

            // 将pivot1 pivot2归位(之前left和right空出来,这里就用到了)
            a[left]  = a[less  - 1]; a[less  - 1] = pivot1;
            a[right] = a[great + 1]; a[great + 1] = pivot2;

            // 实现递归,对左右部分排序,排除已知的轴心,这里就看出leftmost的作用了(不用做左边界检查)
            sort(a, left, less - 2, leftmost);
            sort(a, great + 2, right, false);

            /*
             * e1-e5是总长的4/7,如果中间部分长度大于这个值,就先进行处理,将等于轴心的值放到最左和最右(中间区域:pivot1 <= && <= pivot2)
             */
            if (less < e1 && e5 < great) {
                /*
                 * 跳过和轴相等的
                 */
                while (a[less] == pivot1) {
                    ++less;
                }

                while (a[great] == pivot2) {
                    --great;
                }

                /*剩下的分区左边为等于轴1,右边等于轴2,还有要处理的大于轴1
                 *小于轴2的中间部分,下面把等于轴1和轴2的从中间分区找出来
                 * Partitioning:
                 *
                 *   left part         center part                  right part
                 * +----------------------------------------------------------+
                 * | == pivot1 |  pivot1 < && < pivot2  |    ?    | == pivot2 |
                 * +----------------------------------------------------------+
                 *              ^                        ^       ^
                 *              |                        |       |
                 *             less                      k     great
                 *
                 * Invariants:
                 *
                 *              all in (*,  less) == pivot1
                 *     pivot1 < all in [less,  k)  < pivot2
                 *              all in (great, *) == pivot2
                 *
                 * Pointer k is the first index of ?-part.
                 */
                outer:
                for (int k = less - 1; ++k <= great; ) {
                    int ak = a[k];
                    //等于轴1的左移
                    if (ak == pivot1) { // Move a[k] to left part
                        a[k] = a[less];
                        a[less] = ak;
                        ++less;
                    } else if (ak == pivot2) { // Move a[k] to right part
                    	//跳过有序的
                        while (a[great] == pivot2) {
                            if (great-- == k) {
                                break outer;
                            }
                        }
						//等于轴2的右移,这时判断great的值的大小
                        if (a[great] == pivot1) { // a[great] < pivot2
                            a[k] = a[less];
                            /*
                             * Even though a[great] equals to pivot1, the
                             * assignment a[less] = pivot1 may be incorrect,
                             * if a[great] and pivot1 are floating-point zeros
                             * of different signs. Therefore in float and
                             * double sorting methods we have to use more
                             * accurate assignment a[less] = a[great].
                             */
                            //这里用pivot1赋值给a[less]不是很准确 
                            a[less] = pivot1;
                            ++less;
                        } else { // pivot1 < a[great] < pivot2
                            a[k] = a[great];
                        }
                        a[great] = ak;
                        --great;
                    }
                }
            }

            // 递归排序剩余pivot1 < && < pivot2 的分区
            sort(a, less, great, false);

        } else { // 单轴分区
            /*
             * 使用之前的第三个轴
             * 这个轴是中间值的近似值
             */
            int pivot = a[e3];

            /*
             * 使用传统快排(单轴)
             *
             *   left part    center part              right part
             * +-------------------------------------------------+
             * |  < pivot  |   == pivot   |     ?    |  > pivot  |
             * +-------------------------------------------------+
             *              ^              ^        ^
             *              |              |        |
             *             less            k      great
             *
             * Invariants:
             *
             *   all in (left, less)   < pivot
             *   all in [less, k)     == pivot
             *   all in (great, right) > pivot
             *
             * Pointer k is the first index of ?-part.
             */
            //传统快排也没啥好说的了
            for (int k = less; k <= great; ++k) {
                if (a[k] == pivot) {
                    continue;
                }
                int ak = a[k];
                if (ak < pivot) { // Move a[k] to left part
                    a[k] = a[less];
                    a[less] = ak;
                    ++less;
                } else { // a[k] > pivot - Move a[k] to right part
                    while (a[great] > pivot) {
                        --great;
                    }
                    if (a[great] < pivot) { // a[great] <= pivot
                        a[k] = a[less];
                        a[less] = a[great];
                        ++less;
                    } else { // a[great] == pivot
                        /*
                         * Even though a[great] equals to pivot, the
                         * assignment a[k] = pivot may be incorrect,
                         * if a[great] and pivot are floating-point
                         * zeros of different signs. Therefore in float
                         * and double sorting methods we have to use
                         * more accurate assignment a[k] = a[great].
                         */
                        a[k] = pivot;
                    }
                    a[great] = ak;
                    --great;
                }
            }

            /*
             * 递归排序左右两边,中间分区都等于轴不用再排序
             */
            sort(a, left, less - 1, leftmost);
            sort(a, great + 1, right, false);
        }
    }


总结

这是java JDK中Arrays.sort()排序用到的算法,一个工厂级的算法复杂度和需要考虑的维度,不是一个简单的冒泡或插入能比的,里面很多理念和边界值都是前人的经验所得,学到了学到了。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值