Arrays.sort()源码学习

最新推荐文章于 2023-03-12 11:23:25 发布

zt928815211

最新推荐文章于 2023-03-12 11:23:25 发布

阅读量476

点赞数 1

分类专栏： java

本文链接：https://blog.csdn.net/zt928815211/article/details/83661747

版权

java 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

版本：java version "10.0.2"

一、 Arrays.sort(int[] a) 分析

总结 Arrays.sort(int[] a) 方法：
1. 数组长度 0-47：插排
2. 数组长度 47-286：Dual-Pivot快排
3. 数组长度 >286
1. 数组基本有序：归并
2. 数组乱序：Dual-Pivot快排

Arrays.sort有许多重载方法，这里先看Arrays.sort(int[] a)方法，其他方法大同小异。

public static void sort(int[] a) {
        DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
}

可以看到调用了 DualPivotQuicksort类的sort方法，这里的DualPivotQuicksort是一种快速排序，想了解的同学可以看这两篇文章，讲解的非常透彻：
>双轴快速排序 - 实现原理
>为什么 DualPivotQuicksort 优于普通快排？

下来我们看看 DualPivotQuicksort.sort 方法：

    static void sort(int[] a, int left, int right,
                     int[] work, int workBase, int workLen) {

        /**
         * 当要排序数组长度小于 QUICKSORT_THRESHOLD=286 的时候：
         * 调用 sort(a, left, right, true);先简单说下这个方法：
         * 当数组长度小于 INSERTION_SORT_THRESHOLD=47 的时候，使用插排，
         * 在47-286之间的时候，使用 DualPivotQuicksort 双轴快排。
         */
        if (right - left < QUICKSORT_THRESHOLD) {
            sort(a, left, right, true);
            return;
        }
        
        int[] run = new int[MAX_RUN_COUNT + 1];
        int count = 0;
        run[0] = left;

        /**
         * 此时，数组长度是>286，如果数组基本有序，那就是用双轴快排，否则使用归并排序
         * 
         *
         * 判断数组是否已经基本有序的运行机制是：
         *
         * 将数组差分成若干个递增或递减的子块，将每个块的分界点，存到run数组里，例如：
         *
         * 数组 arr = {9, 6, 6, 4, 8, 20, 18, 17, 16, 15}
         * 可以被划分为三个块
         * {9, 6, 6, 4}  {8, 20}  {18, 17, 16, 15}
         *
         * 对应的run数组是
         * {0, 4, 6, 10, 0, 0, 0, 0,...},
         * 表示 arr[0]-arr[4-1]是第一个块，
         * 表示 arr[4]-arr[6-1]是第二个块，
         * 表示 arr[6]-arr[10-1]是第三个块。
         *
         * 如果数组里的块超过，MAX_RUN_COUNT=67 的时候，说明数组无序度较高，此时使用Dual-Pivot快排
         * 反之表示有序度较高，此时使用归并排序。
         */
        for (int k = left; k < right; run[count] = k) {
            // Equal items in the beginning of the sequence
            while (k < right && a[k] == a[k + 1])
                k++;
            /**
             * 说明数组已经处于有序状态 此时 count=0
             */
            if (k == right) break;  // Sequence finishes with equal items

            if (a[k] < a[k + 1]) { // ascending
                /**
                 * 寻找顺序块的边界
                 */
                while (++k <= right && a[k - 1] <= a[k]) ;
            } else if (a[k] > a[k + 1]) { // descending
                /**
                 * 寻找逆序块的边界
                 */
                while (++k <= right && a[k - 1] >= a[k]) ;
                /**
                 * 调整逆序块，使其变成顺序块
                 */
                for (int lo = run[count] - 1, hi = k; ++lo < --hi; ) {
                    int t = a[lo];
                    a[lo] = a[hi];
                    a[hi] = t;
                }
            }

            /**
             * 当把一个逆序块调整成顺序状态之后，和前面相邻的顺组块进行比较，
             * 看看 上一个顺序块的最后一个元素（也是最大的一个元素） 是否 比当前顺序块第一个元素（也是最小的一个元素） 小
             * 如果 befor.last <= this.first 说明 这两个块可以合并在一起，构成一个顺序块，此时 count--，消除掉一个逆序块
             */
            if (run[count] > left && a[run[count]] >= a[run[count] - 1]) {
                count--;
            }

            /**
             * 当数组已经基本有序的时候，使用快排
             */
            if (++count == MAX_RUN_COUNT) {
                sort(a, left, right, true);
                return;
            }
        }

        /**
         * count == 0 表示数组已经是顺序状态
         */
        if (count == 0) {
            return;
        } else if (count == 1 && run[count] > right) {
            return;
        }
        right++;
        if (run[count] < right) {
            run[++count] = right;
        }

        /**
         * 下面就是使用归并排序进行处理
         */
        byte odd = 0;
        for (int n = 1; (n <<= 1) < count; odd ^= 1) ;

        // 创建临时数组b用来merge
        int[] b;                 // temp array; alternates with a
        int ao, bo;              // array offsets from 'left'
        int blen = right - left; // space needed for b
        if (work == null || workLen < blen || workBase + blen > work.length) {
            work = new int[blen];
            workBase = 0;
        }
        if (odd == 0) {
            System.arraycopy(a, left, work, workBase, blen);
            b = a;
            bo = 0;
            a = work;
            ao = workBase - left;
        } else {
            b = work;
            ao = 0;
            bo = workBase - left;
        }

        // 归并，完成条件：逆序度count=1
        for (int last; count > 1; count = last) {
            for (int k = (last = 0) + 2; k <= count; k += 2) {
                int hi = run[k], mi = run[k - 1];
                for (int i = run[k - 2], p = i, q = mi; i < hi; ++i) {
                    if (q >= hi || p < mi && a[p + ao] <= a[q + ao]) {
                        b[i + bo] = a[p++ + ao];
                    } else {
                        b[i + bo] = a[q++ + ao];
                    }
                }
                run[++last] = hi;
            }
            
            //子序列的个数为奇数个的话，两两归并最后会剩下一个，将这部分复制到b中
            if ((count & 1) != 0) {
                for (int i = right, lo = run[count - 1]; --i >= lo;
                     b[i + bo] = a[i + ao]
                        )
                    ;
                run[++last] = right;
            }

            //交换a、b指针，利用原本的存储空间，继续归并
            int[] t = a;
            a = b;
            b = t;
            int o = ao;
            ao = bo;
            bo = o;
        }

    }

下来再看看，上面代码中，当数组长度<286的时候，使用的sort(a, left, right, true); 方法：

    private static void sort(int[] a, int left, int right, boolean leftmost) {
        int length = right - left + 1;

        /**
         * 当数组长度小于 INSERTION_SORT_THRESHOLD =47 的时候，使用插排
         */
        if (length < INSERTION_SORT_THRESHOLD) {
            if (leftmost) {
                /*
                 * 如果起始点是数组最左边，普通插排
                 * 
                 */
                for (int i = left, j = i; i < right; j = ++i) {
                    int ai = a[i + 1];
                    while (ai < a[j]) {
                        a[j + 1] = a[j];
                        if (j-- == left) {
                            break;
                        }
                    }
                    a[j + 1] = ai;
                }
            } else {
                /*
                 * 先跳过前面的有序子数组，最少跳过1
                 */
                do {
                    if (left >= right) {
                        return;
                    }
                } while (a[++left] >= a[left - 1]);

                /*
                 * 待排序的起始点不是数组的最左边，将他作为哨兵，就避免了检查左边界
                 * of sentinel, therefore this allows us to avoid the
                 * 使用改进的插入排序：pair insertion sort，每次遍历插入两个元素，减小遍历和比较的开销
                 * 
                 */
                for (int k = left; ++left <= right; k = ++left) {
                    int a1 = a[k], a2 = a[left];
                    
                    // 找到代排序序列中前两个，大的赋值给a1，小的赋值给a2
                    if (a1 < a2) {
                        a2 = a1; a1 = a[left];
                    }

                    // 先将较大的a1插入到前面的序列
                    while (a1 < a[--k]) {
                        a[k + 2] = a[k];
                    }
                    a[++k + 1] = a1;

                    //再将较小的a2插入到前面的序列；这样，一次遍历就完成了两个数的插入操作。
                    while (a2 < a[--k]) {
                        a[k + 1] = a[k];
                    }
                    a[k + 1] = a2;
                }
                int last = a[right];
                
                //每次从后面取两个数，如果最后只剩一个，就再将这个数插入到前面。
                while (last < a[--right]) {
                    a[right + 1] = a[right];
                }
                a[right + 1] = last;
            }
            return;
        }

        
        /**
         * 对于长度在47-286之间的数组，使用 DualPivotQuicksort 双轴快排
         */
        int length = right - left + 1;

        // 当数组长度较大时，取 长度/7 的近似值
        int seventh = (length >> 3) + (length >> 6) + 1;

        /*
         * 将待排序数组用五个枢纽（pivot）分割开。
         * 据经验来看，这种切割方法在处理长度较长的数组时，效率较高
         * Sort five evenly spaced elements around (and including) the
         * center element in the range. These elements will be used for
         * pivot selection as described below. The choice for spacing
         * these elements was empirically determined to work well on
         * a wide variety of inputs.
         */
        int e3 = (left + right) >>> 1; // The midpoint
        int e2 = e3 - seventh;
        int e1 = e2 - seventh;
        int e4 = e3 + seventh;
        int e5 = e4 + seventh;

        // 将上面计算得到的五个pivot元素排序
        if (a[e2] < a[e1]) { int t = a[e2]; a[e2] = a[e1]; a[e1] = t; }

        if (a[e3] < a[e2]) { int t = a[e3]; a[e3] = a[e2]; a[e2] = t;
            if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
        }
        if (a[e4] < a[e3]) { int t = a[e4]; a[e4] = a[e3]; a[e3] = t;
            if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
                if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
            }
        }
        if (a[e5] < a[e4]) { int t = a[e5]; a[e5] = a[e4]; a[e4] = t;
            if (t < a[e3]) { a[e4] = a[e3]; a[e3] = t;
                if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;
                    if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }
                }
            }
        }

        // 左右指针，指向数组首尾
        int less  = left;  //  The index of the first element of center part
        int great = right; //  The index before the first element of right part

        if (a[e1] != a[e2] && a[e2] != a[e3] && a[e3] != a[e4] && a[e4] != a[e5]) {
            /*
             * 使用第二个和第四个元素作为枢纽pivot。通过这种折中的方案，近似将数组分成三份。注意：pivot1 <= pivot2
             * Use the second and fourth of the five sorted elements as pivots.
             * These values are inexpensive approximations of the first and
             * second terciles of the array. Note that pivot1 <= pivot2.
             */
            int pivot1 = a[e2];
            int pivot2 = a[e4];

            /* 将首尾元素放到枢纽的位置。分割完成后，回填原本的枢纽元素，这些枢纽元素将不再参与子序列的排序
             *
             * The first and the last elements to be sorted are moved to the
             * locations formerly occupied by the pivots. When partitioning
             * is complete, the pivots are swapped back into their final
             * positions, and excluded from subsequent sorting.
             */
            a[e2] = a[left];
            a[e4] = a[right];

            /*
             * 从头开始，跳过小于pivot1的元素
             * 从尾开始，跳过大于pivot1的元素
             */
            while (a[++less] < pivot1);
            while (a[--great] > pivot2);

            /* 经过上一步，数组被分成了三部分：
             *   left part           center part                   right part
             * +--------------------------------------------------------------+
             * |  < pivot1  |       ??????                       |  > pivot2  |
             * +--------------------------------------------------------------+
             *              ^^                                  ^
             *              ||                                  |
             *             k less                             great
             *
             * 以k为指针，依次遍历中间部分，将小于pivot1的移到左边，大于povit2的移到右边。
             *
             *   left part           center part                   right part
             * +--------------------------------------------------------------+
             * |  < pivot1  |  pivot1 <= && <= pivot2  |    ?    |  > pivot2  |
             * +--------------------------------------------------------------+
             *               ^                          ^       ^
             *               |                          |       |
             *              less                        k     great
             *
             * Invariants:
             *
             *              all in (left, less)   < pivot1
             *    pivot1 <= all in [less, k)     <= pivot2
             *              all in (great, right) > pivot2
             *
             * Pointer k is the first index of ?-part.
             *
             * 最终，数组被分为三部分
             * +--------------------------------------------------------------+
             * |  < pivot1  |  pivot1 <= && <= pivot2            |  > pivot2  |
             * +--------------------------------------------------------------+
             */
            outer:
            for (int k = less - 1; ++k <= great; ) {
                int ak = a[k];
                if (ak < pivot1) { // Move a[k] to left part
                    a[k] = a[less];
                    /*
                     * Here and below we use "a[i] = b; i++;" instead
                     * of "a[i++] = b;" due to performance      .
                     */
                    a[less] = ak;
                    ++less;
                } else if (ak > pivot2) { // Move a[k] to right part
                    while (a[great] > pivot2) {
                        if (great-- == k) {
                            break outer;
                        }
                    }
                    if (a[great] < pivot1) { // a[great] <= pivot2
                        a[k] = a[less];
                        a[less] = a[great];
                        ++less;
                    } else { // pivot1 <= a[great] <= pivot2
                        a[k] = a[great];
                    }
                    /*
                     * Here and below we use "a[i] = b; i--;" instead
                     * of "a[i--] = b;" due to performance issue.
                     */
                    a[great] = ak;
                    --great;
                }
            }

            // 把枢纽元素放入他们的最终位置
            a[left]  = a[less  - 1]; a[less  - 1] = pivot1;
            a[right] = a[great + 1]; a[great + 1] = pivot2;

            // 递归调用自身排序左右两部分
            sort(a, left, less - 2, leftmost);
            sort(a, great + 2, right, false);

            /*
             * less < e1 && e5 < great  表示：如果中间部分太大 (占整个数组元素的 4/7以上)，则需要先调整中间部分的元素，
             * 之后再递归调用自身排序
             */
            if (less < e1 && e5 < great) {
                /*
                 * 跳过等于pivot1和等于pivot2的元素
                 */
                while (a[less] == pivot1) {
                    ++less;
                }

                while (a[great] == pivot2) {
                    --great;
                }

                /* 依次处理中间部分的元素，
                 *
                 * Partitioning:
                 *
                 *   left part         center part                  right part
                 * +----------------------------------------------------------+
                 * | == pivot1 |  pivot1 < && < pivot2  |    ?    | == pivot2 |
                 * +----------------------------------------------------------+
                 *              ^                        ^       ^
                 *              |                        |       |
                 *             less                      k     great
                 *
                 * Invariants:
                 *
                 *              all in (*,  less) == pivot1
                 *     pivot1 < all in [less,  k)  < pivot2
                 *              all in (great, *) == pivot2
                 *
                 * Pointer k is the first index of ?-part.
                 */
                outer:
                for (int k = less - 1; ++k <= great; ) {
                    int ak = a[k];
                    if (ak == pivot1) { // Move a[k] to left part
                        a[k] = a[less];
                        a[less] = ak;
                        ++less;
                    } else if (ak == pivot2) { // Move a[k] to right part
                        while (a[great] == pivot2) {
                            if (great-- == k) {
                                break outer;
                            }
                        }
                        if (a[great] == pivot1) { // a[great] < pivot2
                            a[k] = a[less];
                            /*
                             * a[less] = pivot1 仅在数组元素是int类型时有效。
                             * 如果是double或者float，则应该使用 a[less] = a[great].
                             */
                            a[less] = pivot1;
                            ++less;
                        } else { // pivot1 < a[great] < pivot2
                            a[k] = a[great];
                        }
                        a[great] = ak;
                        --great;
                    }
                }
            }

            // 递归调用自身排序中间部分
            sort(a, less, great, false);

        } else { // 当5个枢纽值其中有几个值相同时，只使用一个pivot进行分割
            /*
             * 使用经过排序的5个枢纽元素最中间的一个作为pivot
             * 可以近似的认为它是数组的中位数.
             */
            int pivot = a[e3];

            /*
             * 使用  Dutch national flag problem 思路处理数组元素：
             * 最终数组被分为三部分：
             * < pivot
             *== pivot
             * > pivot
             * 然后在对左右两部分递归调用自身排序。
             *
             *   left part    center part              right part
             * +-------------------------------------------------+
             * |  < pivot  |   == pivot   |     ?    |  > pivot  |
             * +-------------------------------------------------+
             *              ^              ^        ^
             *              |              |        |
             *             less            k      great
             *
             * Invariants:
             *
             *   all in (left, less)   < pivot
             *   all in [less, k)     == pivot
             *   all in (great, right) > pivot
             *
             * Pointer k is the first index of ?-part.
             */
            for (int k = less; k <= great; ++k) {
                if (a[k] == pivot) {
                    continue;
                }
                int ak = a[k];
                if (ak < pivot) { // Move a[k] to left part
                    a[k] = a[less];
                    a[less] = ak;
                    ++less;
                } else { // a[k] > pivot - Move a[k] to right part
                    while (a[great] > pivot) {
                        --great;
                    }
                    if (a[great] < pivot) { // a[great] <= pivot
                        a[k] = a[less];
                        a[less] = a[great];
                        ++less;
                    } else { // a[great] == pivot
                        /*
                         * a[k] = pivot 仅在数组元素是int类型时有效。
                         * 如果是double或者float，则应该使用 a[k] = a[great].
                         */
                        a[k] = pivot;
                    }
                    a[great] = ak;
                    --great;
                }
            }

            /*
             * 左右部分分别递归调用自身
             * 中间部分全部相等于povit，不需要再排序
             */
            sort(a, left, less - 1, leftmost);
            sort(a, great + 1, right, false);
        }

        
    }

二、 Arrays.sort(double[] a) 和 Arrays.sort(Object[] a)

Arrays.sort(double[] a) 分为三步
1. 把 NaN 放到最后
2. 排序，方法同Arrays.sort(int[] a)
3. 调整0的顺序，使得：负数<-0.0<0.0<正数

Arrays.sort(Object[] a)

what？

默认使用TimSort，（用户可以指定使用归并排序）

how？

1. 根据数组长度,计算出一个 minRun，(是以后操作数组的时候的一个最小长度)，介于16和32之间（MIN_MERGE/2 - MIN_MERGE）

2. 每次找到一个顺序列，如果是逆序的，将其翻转成顺序列；如果该列长度不足minRun，使用二分查找排序将后面的元素插到有序列中，使长度补齐minRun。

3. 将上述有序列入栈

4. 对栈中的有序列做合并操作，使其满足下列情况（有点像2048游戏）

*     1. runLen[i - 3] > runLen[i - 2] + runLen[i - 1]
*     2. runLen[i - 2] > runLen[i - 1]

5. 重复第二步，直到原始数组全部被划分完。

6. 最后对整个栈做一次合并操作。

why？

Timsort避免了快速排序的几个重要缺点：其最坏情况时间复杂性是O(n log n)（没有快速排序的极端慢速情况），具有稳定性和适应性

三、 Collections.sort()

可以看到，底层是将list转换成了array ，然后调用Arrays.sort(Object[] a)。

    public static <T extends Comparable<? super T>> void sort(List<T> list) {
        list.sort(null);
    }

default void sort(Comparator<? super E> c) {
        Object[] a = this.toArray();
        Arrays.sort(a, (Comparator) c);
        ListIterator<E> i = this.listIterator();
        for (Object e : a) {
            i.next();
            i.set((E) e);
        }
    }

附:再备注一篇双轴快排原理及java实现

zt928815211

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Arrays.sort()源码学习

版本：java version "10.0.2" 一、 Arrays.sort(int[] a) 分析总结 Arrays.sort(int[] a) 方法：1. 数组长度 0-47：插排2. 数组长度 47-286：Dual-Pivot快排3. 数组长度 &gt;286 1. 数组基本有序：归并 2. 数组乱序：Dual-Pivot快排Arrays.sort有许...
复制链接

扫一扫

专栏目录