深入Arrays.sort源码

xin麒

已于 2023-03-12 11:28:47 修改

阅读量275

点赞数

分类专栏： java 算法文章标签：排序算法 java 算法深入源码

于 2023-03-12 11:23:25 首次发布

本文链接：https://blog.csdn.net/ws_please/article/details/129473550

版权

算法同时被 2 个专栏收录

49 篇文章 0 订阅

订阅专栏

java

48 篇文章 0 订阅

订阅专栏

深入挖掘Arrays.sort()原理

1、先来一个例子吧：

    @Test
    public void testSorted(){
        Integer[] arr = {1,-2,5,-6,3,9};
//        Arrays.sort(arr,(o1 ,o2)->o1-o2);
        Arrays.sort(arr,new Comparator<Integer>(){
                    @Override
                    public int compare(Integer o1, Integer o2) {
                        return o1 -  o2;
                    }
                }
        );
    }

就这个例子。

2、首先啊，先到Arrays这个类的sort方法里

public static <T> void sort(T[] a, Comparator<? super T> c) {
    if (c == null) {
        sort(a);
    } else {
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a, c);
        else
            TimSort.sort(a, 0, a.length, c, null, 0, 0);
    }
}

在sort方法会怎么样？sort(T[] a, Comparator<? super T> c)这里的a是数组，c是接口类，这里实现的是接口编程。

①先判断c == null，也就是判断有没有接口，那么旧往下走

②判断LegacyMergeSort.userRequested，发现是false，那么就走TimSort.sort(a, 0, a.length, c, null, 0, 0)

3、来到TimSort类，这个类的名字不认识，查一下：是归并排序和插入排序的组合排序

sort(T[] a, int lo, int hi, Comparator<? super T> c,
                     T[] work, int workBase, int workLen)

来看看TimSort.sort详情：

assert c != null && a != null && lo >= 0 && lo <= hi && hi <= a.length;

int nRemaining  = hi - lo;
if (nRemaining < 2)
    return;  // Arrays of size 0 and 1 are always sorted

// If array is small, do a "mini-TimSort" with no merges
if (nRemaining < MIN_MERGE) {
    int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
    binarySort(a, lo, hi, lo + initRunLen, c);
    return;

这里的assert是什么？不知道，查一下：是断言的意思，这个关键字可以判断布尔值的结果是否和预期的一样，如果一样就正常执行，否则会抛出AssertionError。

对于函数参数的解析：

a是数组，lo是0，hi是长度，c是接口实现类，work是null,workBase是0，workLen是0。

①那么先判断nRemaining = hi - lo = len - 0 = 6,一开始肯定是大于2的，跳过；

②再进入if (nRemaining < MIN_MERGE) ，这里min_merge是一个静态final变量，为32，那么可以进入true结构体;

③进入

int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
binarySort(a, lo, hi, lo + initRunLen, c);
return;

4、来countRunAndMakeAscending看看：

private static <T> int countRunAndMakeAscending(T[] a, int lo, int hi,
                                                Comparator<? super T> c)

Ascending看到这个单词想起来asc和dec，分别对应升序和降序。

看看详情：

assert lo < hi;
int runHi = lo + 1;
if (runHi == hi)
    return 1;

// Find end of run, and reverse range if descending
if (c.compare(a[runHi++], a[lo]) < 0) { // Descending
    while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0)
        runHi++;
    reverseRange(a, lo, runHi);
} else {                              // Ascending
    while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) >= 0)
        runHi++;
}

return runHi - lo;

先分析一下参数吧： a是数组, lo是0 ，hi是len,c是接口。

①断言肯定是成功的。

②判断runHi和hi肯定是不相等的，跳过；

③来到一个判断c.compare(a[runHi++], a[lo]) < 0此时runhi是1,lo是0,也就是也就是判断[1] - [0] < 0，也就是[1] < [0 ]是判断元素里前面的元素值大了，在例子里是true的，进入true结构体：

④进入了一个循环体：

while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0)
    runHi++;

这里的意思是判断前面的元素是不是降序的，找到不是降序的那一个runHi出来，然后再进入reverseRange(a, lo, runHi)

5、来到reverseRange

private static void reverseRange(Object[] a, int lo, int hi) {
    hi--;
    while (lo < hi) {
        Object t = a[lo];
        a[lo++] = a[hi];
        a[hi--] = t;
    }
}

这里是将[lo,hi - 1]这些索引对应的元素反转。当要反转的元素个数为奇数时判断条件lo < hi会比lo <= hi多走一步，不过无所谓，有没有等号的结果都一样。

6、再回到来到TimSort类

直接跳到最后一步：

return runHi - lo;

返回了runHi - lo = 2 - 0 = 2。

7、再回到TimSort.sort

来到:

// If array is small, do a "mini-TimSort" with no merges
if (nRemaining < MIN_MERGE) {
    int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
    binarySort(a, lo, hi, lo + initRunLen, c);
    return;
}

进入binarySort方法：

8、来到binarySort：

binary是二进制、二元的意思，这里应该是二分法插入排序。

可以看看注解：

/**
 * Sorts the specified portion of the specified array using a binary
 * insertion sort.  This is the best method for sorting small numbers
 * of elements.  It requires O(n log n) compares, but O(n^2) data
 * movement (worst case).
 *
 * If the initial part of the specified range is already sorted,
 * this method can take advantage of it: the method assumes that the
 * elements from index {@code lo}, inclusive, to {@code start},
 * exclusive are already sorted.
 *
 * @param a the array in which a range is to be sorted
 * @param lo the index of the first element in the range to be sorted
 * @param hi the index after the last element in the range to be sorted
 * @param start the index of the first element in the range that is
 *        not already known to be sorted ({@code lo <= start <= hi})
 * @param c comparator to used for the sort
 */

翻译一下：

/**
     * 使用二进制对指定数组的指定部分进行排序
     * 插入排序。 这是对小数字进行排序的最佳方法
     * 的元素。 它需要 O（n log n） 比较，但 O（n^2） 数据
     *移动（最坏情况）。
     *
     * 如果指定范围的初始部分已经排序，
     * 此方法可以利用它：该方法假定
     * 从索引 {@code lo}（包括）到 {@code 开始} 的元素，
     * 独家已排序。
     *
     * @param要在其中对范围进行排序的数组
     * @param lo 要排序的范围内第一个元素的索引
     * @param 嗨，要排序的范围内最后一个元素之后的索引
     * @param开始范围内第一个元素的索引
     * 尚不知道已排序 （{@code lo <= 开始 <= hi}）
     * @param c比较器用于排序
     */

来看看参数：a时数组, lo是0, hi是len, lo + initRunLen = 0 + 2 = 2, c是接口实现类。

@SuppressWarnings("fallthrough")
private static <T> void binarySort(T[] a, int lo, int hi, int start,
                                   Comparator<? super T> c) {
    assert lo <= start && start <= hi;
    if (start == lo)
        start++;
    for ( ; start < hi; start++) {
        T pivot = a[start];

        // Set left (and right) to the index where a[start] (pivot) belongs
        int left = lo;
        int right = start;
        assert left <= right;
        /*
         * Invariants:
         *   pivot >= all in [lo, left).
         *   pivot <  all in [right, start).
         */
        while (left < right) {
            int mid = (left + right) >>> 1;
            if (c.compare(pivot, a[mid]) < 0)
                right = mid;
            else
                left = mid + 1;
        }
        assert left == right;

        /*
         * The invariants still hold: pivot >= all in [lo, left) and
         * pivot < all in [left, start), so pivot belongs at left.  Note
         * that if there are elements equal to pivot, left points to the
         * first slot after them -- that's why this sort is stable.
         * Slide elements over to make room for pivot.
         */
        int n = start - left;  // The number of elements to move
        // Switch is just an optimization for arraycopy in default case
        switch (n) {
            case 2:  a[left + 2] = a[left + 1];
            case 1:  a[left + 1] = a[left];
                     break;
            default: System.arraycopy(a, left, a, left + 1, n);
        }
        a[left] = pivot;
    }
}

太长了，慢慢分析吧：

①先断言：assert lo <= start && start <= hi，这里的start也就是runHi + lo = 2肯定是可以满足的

②判断start和lo是不通过的，跳；

发现里面又出现之前不认识的内容：

public static void main(String[] args) {
    System.out.println(4 >> 1);
    System.out.println(4 >>> 1);

    System.out.println(4 << 1);
  //  System.out.println(4 <<< 1);
}

就是>>和>>>有什么区别？有没有<<<

发现是没有<<<的，写了的话会出现Expression expected警告，会报错。

<<肯定是左移了，>>是右移。没想到>>>也是右移。查了一下发现>>和>>>不一样的，

对于正数右移，高位补0.

>>是有符号右移，对于负数，那么右移后，高位补1；>>>是无符号右移（又回到了学汇编的那个夏天），也就是无论正数还是负数，右移都是高位补0.

那么继续分析吧③

binarySort这个方法实际上就是对数据进行二分法排序，搬迁数据时会调用c++代码进行调用，下面详细解析一下：

for ( ; start < hi; start++) { //2 < 6? yes
    T pivot = a[start];//pivot = a[2] = 5 Ⅱ、 = -6

    // Set left (and right) to the index where a[start] (pivot) belongs
    int left = lo;//left = 0 Ⅱ、=0
    int right = start;// right = 2 Ⅱ、=3
    assert left <= right;//0 < 2? -> yes Ⅱ、=> 0 <= 3 ok
    /*
     * Invariants:
     *   pivot >= all in [lo, left).
     *   pivot <  all in [right, start).
     */
    while (left < right) {//0 < 2 ? yes Ⅱ、=>yes
        int mid = (left + right) >>> 1;// mid = (0 + 2) / 2 = 1,a = {-2,1,5,-6,3,9} Ⅱ、mid=1
        if (c.compare(pivot, a[mid]) < 0)//5 -( 1) = 4 < 0 ? ->false Ⅱ、-6<1?yes
            right = mid;//Ⅱ、right=1
        else
            left = mid + 1;//left = 1+1=2
    }
    assert left == right; // 2 == 2 ? => okⅡ、right=0
    /*
     * The invariants still hold: pivot >= all in [lo, left) and
     * pivot < all in [left, start), so pivot belongs at left.  Note
     * that if there are elements equal to pivot, left points to the
     * first slot after them -- that's why this sort is stable.
     * Slide elements over to make room for pivot.
     */
    int n = start - left;  // The number of elements to move n = 2 - 2 = 0Ⅱ、n=3-0=3
    // Switch is just an optimization for arraycopy in default case
    switch (n) {//n = 0。注意为什么有3个选项？因为native方法不能频繁调用，影响性能。
        case 2:  a[left + 2] = a[left + 1];//例如[1,2,3,5,6,4],lt=3,st=5,a[5]=6,再穿透到n=1的case
        case 1:  a[left + 1] = a[left];//例如：[1,2,3,4,6,5],lt=4,st=5,5-4=1,此时就不需要调用耗性能的JNI了
                 break;
        default: System.arraycopy(a, left, a, left + 1, n);//(a,2,a,3,0)
            //Ⅱ、(a,0,a,1,3),a = {-2,1,5,-6,3,9}
            //这里的arraycopy是什么意思？看到是native方法，看来是调用了c++应该是搬迁数据？应该是的。可以看看代码解析：arraycopy(src：a, srcPos:0, dest:a, destPos:1, length:3);之前也写过类似的，应该是将[0,2]的内容搬迁到[1,3]里，看后面的就知道了。
            
    }
    a[left] = pivot;//a[2] = 5 这里和二分法很像，看这里就知道了，将a[0] = pivot=-6