Timsort——Java源码阅读记录

最新推荐文章于 2023-01-30 19:45:08 发布

长白山西红柿

最新推荐文章于 2023-01-30 19:45:08 发布

阅读量223

点赞数

分类专栏：算法源码解析文章标签： java 算法

本文链接：https://blog.csdn.net/Backee/article/details/115430091

版权

源码解析同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

算法

3 篇文章 0 订阅

订阅专栏

在工作的时候用到了对泛型集合排序，本来以为里面也是像排序数组一样使用双轴快排，但是跟着Collections.sort()源码里发现是用了TimSort.sort()的排序，去网上简单的搜索了一下，说TimSort用了归并排序，并极大程度的利用了自然界很多数都已经拍好序了这个规律，其中比较好的文章是：世界上最快的排序算法——Timsort，我在看完源码之后虽然看"懂"了，TimSort它是怎么做的，但是不清楚它为什么这么做，这篇文章就解答了我很多疑惑。为了帮助大家更好的理解TimSort，也为了让我记录一下Timsort的笔记，所以我写了这篇博客来和大家分享一下我对JDK1.8版本源码TimSort的阅读心得。

我们先跟Collections.sort()进去发现是调用传进来集合的list.sort()，代码如下。可以看到主要干了三件事情，先是调用本身的toArray()方法生成一个数组，再调用Arrays.sort()方法对数组进行排序，最后将排序好的结果再塞回list里

default void sort(Comparator<? super E> c) {
        Object[] a = this.toArray();
        Arrays.sort(a, (Comparator) c);
        ListIterator<E> i = this.listIterator();
        for (Object e : a) {
            i.next();
            i.set((E) e);
        }
    }

下面是Arrays.sort()的代码，可以看到如果没有传比较器的话就会调用ComparableTimSort.sort(),如果传了比较器则直接使用Tim.sort(), 至于legacyMergeSort(a, c)是已经被淘汰的方法。

public static <T> void sort(T[] a, Comparator<? super T> c) {
        if (c == null) {
            sort(a);
        } else {
            if (LegacyMergeSort.userRequested)
                legacyMergeSort(a, c);
            else
                TimSort.sort(a, 0, a.length, c, null, 0, 0);
        }
    }

我们直接跟进去TimSort.sort()方法，ComparableTimSort里的sort方法逻辑是和Timsort一样的，只不过是元素间进行比较的时候有区别而已。

这个方法是完成整个排序的核心方法，所有的排序都会在这个方法里完成。

改方法里调用的其它方法我都会下面进行详细讲解。

static <T> void sort(T[] a, int lo, int hi, Comparator<? super T> c,
                         T[] work, int workBase, int workLen) {
        assert c != null && a != null && lo >= 0 && lo <= hi && hi <= a.length;

        int nRemaining  = hi - lo;
        if (nRemaining < 2)
            return;  // Arrays of size 0 and 1 are always sorted

        // If array is small, do a "mini-TimSort" with no merges
        //  private static final int MIN_MERGE = 32;
        // 如果要排序的部分小于这个最小的阈值，则进行二分插值排序
        if (nRemaining < MIN_MERGE) {
            // 首先计算出数组刚开始已经排序好的部分
            int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
            // 然后对剩下的部分进行排序
            binarySort(a, lo, hi, lo + initRunLen, c);
            return;
        }

        /**
         * March over the array once, left to right, finding natural runs,
         * extending short natural runs to minRun elements, and merging runs
         * to maintain stack invariant.
         */
        // 接下来构造一个Timsort 
        TimSort<T> ts = new TimSort<>(a, c, work, workBase, workLen);
        // 计算出单个run的最小长度，run可以看出是原数组的一个排序段，每个run内部都是升序的
        int minRun = minRunLength(nRemaining);
        do {
            // Identify next run
            // 首先计算出数组刚已经排序好的部分
            int runLen = countRunAndMakeAscending(a, lo, hi, c);

            // If run is short, extend to min(minRun, nRemaining)
            // 如果数组中有序部分的长度无法满足run的最小长度
            if (runLen < minRun) {
                // 取还需要排序的字段和minRun 的最小值
                int force = nRemaining <= minRun ? nRemaining : minRun;
                // 对force部分进行二分插值排序
                binarySort(a, lo, lo + force, lo + runLen, c);
                runLen = force;
            }

            // Push run onto pending-run stack, and maybe merge
            // 至此run里所有的元素都是升序的，将该run放在栈中
            ts.pushRun(lo, runLen);
            // 合并run
            ts.mergeCollapse();

            // Advance to find next run
            // 移动lo，寻找下一个run
            lo += runLen;
            // 更新还需要排序的长度
            nRemaining -= runLen;
        } while (nRemaining != 0);

        // Merge all remaining runs to complete sort
        assert lo == hi;
        // 强制合并剩余的所有run
        ts.mergeForceCollapse();
        assert ts.stackSize == 1;
    }

countRunAndMakeAscending方法，从这个方法就可以看出TimSort非常相信在一个数组里，有很多段是有序的，无论它是降序还是升序。

private static <T> int countRunAndMakeAscending(T[] a, int lo, int hi,
                                                    Comparator<? super T> c) {
        // 断言校验传进的值合法
        assert lo < hi;
        int runHi = lo + 1;
        if (runHi == hi)
            return 1;

        // Find end of run, and reverse range if descending
        // 如果发现刚开始是降序的，那么找到降序的最后一个元素的下班，去给它进行反转
        if (c.compare(a[runHi++], a[lo]) < 0) { // Descending
            while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0)
                runHi++;
            // 反转找到的降序的部分
            reverseRange(a, lo, runHi);
        } else {                              // Ascending
           // 找到升序元素的最后一个元素的下标
            while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) >= 0)
                runHi++;
        }

        return runHi - lo;
    }

/**
* 一个很简单的左右指针反转数组元素方法
*/
private static void reverseRange(Object[] a, int lo, int hi) {
        hi--;
        while (lo < hi) {
            Object t = a[lo];
            a[lo++] = a[hi];
            a[hi--] = t;
        }
    }

binarySort方法

private static <T> void binarySort(T[] a, int lo, int hi, int start,
                                       Comparator<? super T> c) {
        assert lo <= start && start <= hi;
        if (start == lo)
            start++;
        for ( ; start < hi; start++) {
            // 在循环里先是取出未排序区间的第一个元素
            T pivot = a[start];

            // Set left (and right) to the index where a[start] (pivot) belongs
            int left = lo;
            int right = start;
            assert left <= right;
            /*
             * Invariants:
             *   pivot >= all in [lo, left).
             *   pivot <  all in [right, start).
             */
            // 接着在已经排序好的字段区间里通过二分查找的方法，找到该元素可以插入的下标，这也就是该方法为什么被称为二分插值排序的主要原因
            while (left < right) {
                int mid = (left + right) >>> 1;
                if (c.compare(pivot, a[mid]) < 0)
                    right = mid;
                else
                    left = mid + 1;
            }
            assert left == right;

            /*
             * The invariants still hold: pivot >= all in [lo, left) and
             * pivot < all in [left, start), so pivot belongs at left.  Note
             * that if there are elements equal to pivot, left points to the
             * first slot after them -- that's why this sort is stable.
             * Slide elements over to make room for pivot.
             */
            int n = start - left;  // The number of elements to move
            // Switch is just an optimization for arraycopy in default case
            // 针对较少移动的元素做了优化
            // 注意这里有一个小细节，如果n=2，下面的switch会执行两行代码，直到break；
            // 我去看了一下switch编译成的字节码文件，switch将匹配和执行代码是分开的，也就是说switch中的语句在编译成的字节码中是按顺序排列的，在switch匹配到某一个值的时候，它会直接跳转到对应的代码行去执行往下执行，直到遇到break；（break在字节码中变为了goto指令）
            switch (n) {
                case 2:  a[left + 2] = a[left + 1];
                case 1:  a[left + 1] = a[left];
                         break;
                // 使用copy方法向后移动一位所有比pivot的元素
                default: System.arraycopy(a, left, a, left + 1, n);
            }

            a[left] = pivot;
        }
    }

TimSort的构造方法

 private TimSort(T[] a, Comparator<? super T> c, T[] work, int workBase, int workLen) {
        this.a = a;
        this.c = c;

        // Allocate temp storage (which may be increased later if necessary)
        int len = a.length;
        // 设置在归并排序中用到的额外的空间数组，最大为256，如果排序长度小于512则使用len长度的一半
        // private static final int INITIAL_TMP_STORAGE_LENGTH = 256;
        int tlen = (len < 2 * INITIAL_TMP_STORAGE_LENGTH) ?
            len >>> 1 : INITIAL_TMP_STORAGE_LENGTH;
        if (work == null || workLen < tlen || workBase + tlen > work.length) {
            @SuppressWarnings({"unchecked", "UnnecessaryLocalVariable"})
            T[] newArray = (T[])java.lang.reflect.Array.newInstance
                (a.getClass().getComponentType(), tlen);
            tmp = newArray;
            tmpBase = 0;
            tmpLen = tlen;
        }
        else {
            tmp = work;
            tmpBase = workBase;
            tmpLen = workLen;
        }

        /*
         * Allocate runs-to-be-merged stack (which cannot be expanded).  The
         * stack length requirements are described in listsort.txt.  The C
         * version always uses the same stack length (85), but this was
         * measured to be too expensive when sorting "mid-sized" arrays (e.g.,
         * 100 elements) in Java.  Therefore, we use smaller (but sufficiently
         * large) stack lengths for smaller arrays.  The "magic numbers" in the
         * computation below must be changed if MIN_MERGE is decreased.  See
         * the MIN_MERGE declaration above for more information.
         * The maximum value of 49 allows for an array up to length
         * Integer.MAX_VALUE-4, if array is filled by the worst case stack size
         * increasing scenario. More explanations are given in section 4 of:
         * http://envisage-project.eu/wp-content/uploads/2015/02/sorting.pdf
         */
        // 设置归并排序中用到的栈，根据栈单调递减和相近合并两个特性，栈收敛的速度是斐波那契数列一样，由此可以计算出栈的深度，一定是比log1.618N小的。（此处不明白的同学可以看一下我开头推荐的文章，或者可以忽略只是了解它是排序时所用到的栈即可）
        int stackLen = (len <    120  ?  5 :
                        len <   1542  ? 10 :
                        len < 119151  ? 24 : 49);
        runBase = new int[stackLen];
        runLen = new int[stackLen];
    }

minRunLength方法

// 该方法的计算出run的最小值是一个经验值，根据n的大小自适应计算，至于使用该算法的原因目前我还不知道，特别感兴趣的同学可以查阅jdk官方文档   
private static int minRunLength(int n) {
        assert n >= 0;
        int r = 0;      // Becomes 1 if any 1 bits are shifted off
        // 如果n大于32则一直缩短为原来的1/2，最终如果过程中n的低位有1，则加一返回
        // private static final int MIN_MERGE = 32;
        while (n >= MIN_MERGE) {
            r |= (n & 1);
            n >>= 1;
        }
        return n + r;
    }

ts.pushRun入栈方法

// runBase是该run第一个元素的下标，runLen是该run的长度
// run虽然定义上是将原数组给分割开，但实际上只是用栈来记录run开始的下标和run的长度
rivate void pushRun(int runBase, int runLen) {
        this.runBase[stackSize] = runBase;
        this.runLen[stackSize] = runLen;
        stackSize++;
}

ts.mergeCollapse合并run

private void mergeCollapse() {
        while (stackSize > 1) {
            int n = stackSize - 2;
            // 如果栈顶第三个run长度小于前两个run长度之和
            if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) {
                // 先合并相近的，这是由栈从底到顶单调递减所决定的
                if (runLen[n - 1] < runLen[n + 1])
                    n--;
                mergeAt(n);
            } else if (runLen[n] <= runLen[n + 1]) {
                // 如果新入栈的run比栈顶的大，直接合并
                mergeAt(n);
            } else {
                break; // Invariant is established
            }
        }
    }

mergeAt(n)合并栈中n和n+1的方法

 private void mergeAt(int i) {
        assert stackSize >= 2;
        assert i >= 0;
        assert i == stackSize - 2 || i == stackSize - 3;

        // 从栈中去出要合并的两个run
        int base1 = runBase[i];
        int len1 = runLen[i];
        int base2 = runBase[i + 1];
        int len2 = runLen[i + 1];
        assert len1 > 0 && len2 > 0;
        assert base1 + len1 == base2;

        /*
         * Record the length of the combined runs; if i is the 3rd-last
         * run now, also slide over the last run (which isn't involved
         * in this merge).  The current run (i+1) goes away in any case.
         */
        // 设置两个run合并后的长度
        runLen[i] = len1 + len2;
        // 如果合并的run的栈顶第三个和栈顶第二个，那么把第一个的值覆盖到第二个，栈的size再减一
        if (i == stackSize - 3) {
            runBase[i + 1] = runBase[i + 2];
            runLen[i + 1] = runLen[i + 2];
        }
        stackSize--;

        /*
         * Find where the first element of run2 goes in run1. Prior elements
         * in run1 can be ignored (because they're already in place).
         */
        // 寻找run2第一个元素，在run1里最右边的位置
        // 这个是为后续寻找需要合并长度的一个准备
        int k = gallopRight(a[base2], a, base1, len1, 0, c);
        assert k >= 0;
        // k表明前k个元素是不需要排序的，因为a[base]是run2最小的元素，而run1里已经有k个元素比其它元素都小，因此不需要再进行排序
        base1 += k;
        len1 -= k;
        // len1等于0说明run1所有的元素都比run2最小的元素都小，而每个run都是有序的，因此直接返回
        if (len1 == 0)
            return;

        /*
         * Find where the last element of run1 goes in run2. Subsequent elements
         * in run2 can be ignored (because they're already in place).
         */
        // 同样的这个方法寻找run1中最大的元素在run2中插入的位置
        // 找出的这个偏移量直接作为run2的长度，和上面是一样的，run2是升序的，run1的最大值在run2中的位置决定了run2的前半部分需要排序，而后半部分不需要排序。
        len2 = gallopLeft(a[base1 + len1 - 1], a, base2, len2, len2 - 1, c);
        assert len2 >= 0;
        if (len2 == 0)
            return;

        // Merge remaining runs, using tmp array with min(len1, len2) elements
        // 取run1和run2较小的部分进行排序，是为了使用额外空间尽可能的小
        if (len1 <= len2)
            // 使用run1长度的额外空间进行排序，从左向右进行归并排序
            mergeLo(base1, len1, base2, len2);
        else
            // 使用run2长度的额外空间进行排序，这个方法我就不浪费篇幅在这里赘述了，就是使用了run2长度的额外空间，然后从run2的最后一个从右向左进行归并排序
            mergeHi(base1, len1, base2, len2);
    }

gallopRight方法

// 寻找key在a数组里从base下标开始到长度为len的区间，在启示下标为hint的情况下，最右的插入位置
private static <T> int gallopRight(T key, T[] a, int base, int len,
                                       int hint, Comparator<? super T> c) {
        assert len > 0 && hint >= 0 && hint < len;

        int ofs = 1;
        int lastOfs = 0;
        // 如果key小于这个提示的下标位置，则向hint左边开始找，反之从右边开始
        // hint的作用就是给我要搜索哪块区域去做一个提示，可以有效的缩短我二分查找key可以插入位置的时间
        if (c.compare(key, a[base + hint]) < 0) {
            // Gallop left until a[b+hint - ofs] <= key < a[b+hint - lastOfs]
            int maxOfs = hint + 1;
            // 循环增大偏移量ofs，向左直到找到一个位置key >= a[base + hint - ofs]
            while (ofs < maxOfs && c.compare(key, a[base + hint - ofs]) < 0) {
                lastOfs = ofs;
                ofs = (ofs << 1) + 1;
                // 防止int溢出
                if (ofs <= 0)   // int overflow
                    ofs = maxOfs;
            }
            if (ofs > maxOfs)
                ofs = maxOfs;

            // Make offsets relative to b
            int tmp = lastOfs;
            lastOfs = hint - ofs;
            ofs = hint - tmp;
        } else { // a[b + hint] <= key
            // Gallop right until a[b+hint + lastOfs] <= key < a[b+hint + ofs]
            // 和上边一样，只不过这个是向右找
            int maxOfs = len - hint;
            while (ofs < maxOfs && c.compare(key, a[base + hint + ofs]) >= 0) {
                lastOfs = ofs;
                ofs = (ofs << 1) + 1;
                if (ofs <= 0)   // int overflow
                    ofs = maxOfs;
            }
            if (ofs > maxOfs)
                ofs = maxOfs;

            // Make offsets relative to b
            lastOfs += hint;
            ofs += hint;
        }
        assert -1 <= lastOfs && lastOfs < ofs && ofs <= len;

        /*
         * Now a[b + lastOfs] <= key < a[b + ofs], so key belongs somewhere to
         * the right of lastOfs but no farther right than ofs.  Do a binary
         * search, with invariant a[b + lastOfs - 1] <= key < a[b + ofs].
         */
        // 可以看到这里的注释，上边的算法保证了a[b + lastOfs] <= key < a[b + ofs],然后使用二分法找出下标
        lastOfs++;
        while (lastOfs < ofs) {
            int m = lastOfs + ((ofs - lastOfs) >>> 1);

            if (c.compare(key, a[base + m]) < 0)
                ofs = m;          // key < a[b + m]
            else
                lastOfs = m + 1;  // a[b + m] <= key
        }
        // 所以对这个区间进行二分查找可以得出key可以插入的下标为b + ofs，最终返回偏移量ofs
        assert lastOfs == ofs;    // so a[b + ofs - 1] <= key < a[b + ofs]
        return ofs;
    }

gallopLeft方法

// 寻找逻辑基本和gallopRight方法一样，只不过是寻找最左侧的插入位置，这是为了保证排序的稳定性
private static <T> int gallopLeft(T key, T[] a, int base, int len, int hint,
                                      Comparator<? super T> c) {
        assert len > 0 && hint >= 0 && hint < len;
        int lastOfs = 0;
        int ofs = 1;
        // 如果启示hint下标的元素小于key，则向右找到一个比它大于或等于的元素
        if (c.compare(key, a[base + hint]) > 0) {
            // Gallop right until a[base+hint+lastOfs] < key <= a[base+hint+ofs]
            int maxOfs = len - hint;
            while (ofs < maxOfs && c.compare(key, a[base + hint + ofs]) > 0) {
                lastOfs = ofs;
                ofs = (ofs << 1) + 1;
                if (ofs <= 0)   // int overflow
                    ofs = maxOfs;
            }
            if (ofs > maxOfs)
                ofs = maxOfs;

            // Make offsets relative to base
            lastOfs += hint;
            ofs += hint;
        } else { // key <= a[base + hint]
            // Gallop left until a[base+hint-ofs] < key <= a[base+hint-lastOfs]
            // 向左
            final int maxOfs = hint + 1;
            while (ofs < maxOfs && c.compare(key, a[base + hint - ofs]) <= 0) {
                lastOfs = ofs;
                ofs = (ofs << 1) + 1;
                if (ofs <= 0)   // int overflow
                    ofs = maxOfs;
            }
            if (ofs > maxOfs)
                ofs = maxOfs;

            // Make offsets relative to base
            int tmp = lastOfs;
            lastOfs = hint - ofs;
            ofs = hint - tmp;
        }
        assert -1 <= lastOfs && lastOfs < ofs && ofs <= len;

        /*
         * Now a[base+lastOfs] < key <= a[base+ofs], so key belongs somewhere
         * to the right of lastOfs but no farther right than ofs.  Do a binary
         * search, with invariant a[base + lastOfs - 1] < key <= a[base + ofs].
         */
        // 同样的使用二分法查找最左侧的适合插入的位置，可以保证排序稳定性
        lastOfs++;
        while (lastOfs < ofs) {
            int m = lastOfs + ((ofs - lastOfs) >>> 1);

            if (c.compare(key, a[base + m]) > 0)
                lastOfs = m + 1;  // a[base + m] < key
            else
                ofs = m;          // key <= a[base + m]
        }
        assert lastOfs == ofs;    // so a[base + ofs - 1] < key <= a[base + ofs]
        return ofs;
    }

mergeLo方法，真真正正进行排序的方法

private void mergeLo(int base1, int len1, int base2, int len2) {
        assert len1 > 0 && len2 > 0 && base1 + len1 == base2;

        // Copy first run into temp array
        T[] a = this.a; // For performance
        // 确认辅助数组的空间长度满足run1的长度
        T[] tmp = ensureCapacity(len1);
        // 额外空间待排序区间的下标 
        int cursor1 = tmpBase; // Indexes into tmp array
        // run2空间待排序空间的下标       
        int cursor2 = base2;   // Indexes int a
        // run1和run2整个已经排序好区间的末尾，或者说是待排序区间的第一个，因为run1和run2在空间上肯定是连续的
        int dest = base1;      // Indexes int a
        //  将run1的所有值赋值给辅助数组
        System.arraycopy(a, base1, tmp, cursor1, len1);

        // Move first element of second run and deal with degenerate cases
        // 根据mergeAt方法的一系列操作，可以保证run2的第一个元素比run1的所有元素要小
        a[dest++] = a[cursor2++];
        // 如果run2没有了，直接把tmp赋值到剩余的空间后返回
        if (--len2 == 0) {
            System.arraycopy(tmp, cursor1, a, dest, len1);
            return;
        }
        // 如果run1还有一个元素，根据mergeAt方法的一系列操作，可以保证run1的最后一个元素是最大的元素
        if (len1 == 1) {
            System.arraycopy(a, cursor2, a, dest, len2);
            a[dest + len2] = tmp[cursor1]; // Last elt of run 1 to end of merge
            return;
        }

        Comparator<? super T> c = this.c;  // Use local variable for performance
        // private int minGallop = MIN_GALLOP;
        int minGallop = this.minGallop;    //  "    "       "     "      "

        // 终于到了真真真正的归并排序
    outer:
        while (true) {
            int count1 = 0; // Number of times in a row that first run won
            int count2 = 0; // Number of times in a row that second run won

            /*
             * Do the straightforward thing until (if ever) one run starts
             * winning consistently.
             */
            // 进行一个个元素归并排序，其中count1和count2记录连续次数，minGallop的值为7，如果count1为7，说明从tmp数组中连续7个数字都是已经排好序的，很有可能之后也是排好序的，那么就会跳出单个排序的循环，进入飞奔模式
            do {
                assert len1 > 1 && len2 > 0;
                if (c.compare(a[cursor2], tmp[cursor1]) < 0) {
                    a[dest++] = a[cursor2++];
                    count2++;
                    count1 = 0;
                    if (--len2 == 0)
                        break outer;
                } else {
                    a[dest++] = tmp[cursor1++];
                    count1++;
                    count2 = 0;
                    if (--len1 == 1)
                        break outer;
                }
            } while ((count1 | count2) < minGallop);

            /*
             * One run is winning so consistently that galloping may be a
             * huge win. So try that, and continue galloping until (if ever)
             * neither run appears to be winning consistently anymore.
             */
            // 飞奔模式，找出已经排好序的区间，使用数组copy的方式可以更快的赋值
            do {
                assert len1 > 1 && len2 > 0;
                // 找到run2的cursor2，在tmp的cursor1之后最右侧的插入位置，该方法已经在上面解释过了
                count1 = gallopRight(a[cursor2], tmp, cursor1, len1, 0, c);
                if (count1 != 0) {
                    // 直接使用copy，效率更高
                    System.arraycopy(tmp, cursor1, a, dest, count1);
                    dest += count1;
                    cursor1 += count1;
                    len1 -= count1;
                    if (len1 <= 1) // len1 == 1 || len1 == 0
                        break outer;
                }
                // 此时cursor2就是最小的元素，直接排到数组里
                a[dest++] = a[cursor2++];
                if (--len2 == 0)
                    break outer;
                // 再找cursor1在run2里最左侧的插入位置
                count2 = gallopLeft(tmp[cursor1], a, cursor2, len2, 0, c);
                // 此时在run2里cursor2到count2区间的所有元素是小于tmp[cursor1]
                if (count2 != 0) {
                    // 同样直接使用copy
                    System.arraycopy(a, cursor2, a, dest, count2);
                    dest += count2;
                    cursor2 += count2;
                    len2 -= count2;
                    if (len2 == 0)
                        break outer;
                }
                a[dest++] = tmp[cursor1++];
                if (--len1 == 1)
                    break outer;
                // 适当降低进入飞奔模式的阈值
                minGallop--;
                // 如果全部小于MIN_GALLOP=7 飞奔模式的最小阈值则退出飞奔模式
            } while (count1 >= MIN_GALLOP | count2 >= MIN_GALLOP);
            if (minGallop < 0)
                minGallop = 0;
            // 退出飞奔模式的代价是加2
            // 这里我的理解是如果run1和run2中已经排序好的区间比较多由于一些特殊原因退出飞奔模式，那么说它之后也是非常有可能有连续的拍好序的区间，所以上面每次循环都会minGallop--，但是阈值也不能太小所以退出时要加2
            minGallop += 2;  // Penalize for leaving gallop mode
        }  // End of "outer" loop
        this.minGallop = minGallop < 1 ? 1 : minGallop;  // Write back to field
        // 到这里排序就基本结束了，从上面可以看出退出的条件是len1等于1或者len2==0
        if (len1 == 1) {
            // 前面说过len1的最后一个是最大的元素，所以要排在末尾
            assert len2 > 0;
            System.arraycopy(a, cursor2, a, dest, len2);
            a[dest + len2] = tmp[cursor1]; //  Last elt of run 1 to end of merge
        } else if (len1 == 0) {
            // len1等于0，在上面看只有飞奔模式中发现，tmp的所有元素都是小于run2，但是在mergeAt方法里又保证了run1最后一个元素是最大的，两次比较结果不一致则说明是你比较器有问题，所以抛出异常
            throw new IllegalArgumentException(
                "Comparison method violates its general contract!");
        } else {
            // 这里是len1大于1，len2等于0，说明tmp中剩下的元素也都是都是最大的，直接copy
            assert len2 == 0;
            assert len1 > 1;
            System.arraycopy(tmp, cursor1, a, dest, len1);
        }
    }

ensureCapacity确认辅助数组的长度满足排序的需求

private T[] ensureCapacity(int minCapacity) {
        if (tmpLen < minCapacity) {
            // 计算出大于等于minCapacity的最小的2的幂次数
            // Compute smallest power of 2 > minCapacity
            int newSize = minCapacity;
            newSize |= newSize >> 1;
            newSize |= newSize >> 2;
            newSize |= newSize >> 4;
            newSize |= newSize >> 8;
            newSize |= newSize >> 16;
            newSize++;
            
            if (newSize < 0) // Not bloody likely!
                newSize = minCapacity;
            else
                // 辅助空间的长度不可能超过数组长度的一半
                newSize = Math.min(newSize, a.length >>> 1);

            @SuppressWarnings({"unchecked", "UnnecessaryLocalVariable"})
            T[] newArray = (T[])java.lang.reflect.Array.newInstance
                (a.getClass().getComponentType(), newSize);
            tmp = newArray;
            tmpLen = newSize;
            tmpBase = 0;
        }
        return tmp;
    }

mergeHi感兴趣的同学可以自己尝试解读一下这个从右向左归并排序哦

private void mergeHi(int base1, int len1, int base2, int len2) {
        assert len1 > 0 && len2 > 0 && base1 + len1 == base2;

        // Copy second run into temp array
        T[] a = this.a; // For performance
        T[] tmp = ensureCapacity(len2);
        int tmpBase = this.tmpBase;
        System.arraycopy(a, base2, tmp, tmpBase, len2);

        int cursor1 = base1 + len1 - 1;  // Indexes into a
        int cursor2 = tmpBase + len2 - 1; // Indexes into tmp array
        int dest = base2 + len2 - 1;     // Indexes into a

        // Move last element of first run and deal with degenerate cases
        a[dest--] = a[cursor1--];
        if (--len1 == 0) {
            System.arraycopy(tmp, tmpBase, a, dest - (len2 - 1), len2);
            return;
        }
        if (len2 == 1) {
            dest -= len1;
            cursor1 -= len1;
            System.arraycopy(a, cursor1 + 1, a, dest + 1, len1);
            a[dest] = tmp[cursor2];
            return;
        }

        Comparator<? super T> c = this.c;  // Use local variable for performance
        int minGallop = this.minGallop;    //  "    "       "     "      "
    outer:
        while (true) {
            int count1 = 0; // Number of times in a row that first run won
            int count2 = 0; // Number of times in a row that second run won

            /*
             * Do the straightforward thing until (if ever) one run
             * appears to win consistently.
             */
            do {
                assert len1 > 0 && len2 > 1;
                if (c.compare(tmp[cursor2], a[cursor1]) < 0) {
                    a[dest--] = a[cursor1--];
                    count1++;
                    count2 = 0;
                    if (--len1 == 0)
                        break outer;
                } else {
                    a[dest--] = tmp[cursor2--];
                    count2++;
                    count1 = 0;
                    if (--len2 == 1)
                        break outer;
                }
            } while ((count1 | count2) < minGallop);

            /*
             * One run is winning so consistently that galloping may be a
             * huge win. So try that, and continue galloping until (if ever)
             * neither run appears to be winning consistently anymore.
             */
            do {
                assert len1 > 0 && len2 > 1;
                count1 = len1 - gallopRight(tmp[cursor2], a, base1, len1, len1 - 1, c);
                if (count1 != 0) {
                    dest -= count1;
                    cursor1 -= count1;
                    len1 -= count1;
                    System.arraycopy(a, cursor1 + 1, a, dest + 1, count1);
                    if (len1 == 0)
                        break outer;
                }
                a[dest--] = tmp[cursor2--];
                if (--len2 == 1)
                    break outer;

                count2 = len2 - gallopLeft(a[cursor1], tmp, tmpBase, len2, len2 - 1, c);
                if (count2 != 0) {
                    dest -= count2;
                    cursor2 -= count2;
                    len2 -= count2;
                    System.arraycopy(tmp, cursor2 + 1, a, dest + 1, count2);
                    if (len2 <= 1)  // len2 == 1 || len2 == 0
                        break outer;
                }
                a[dest--] = a[cursor1--];
                if (--len1 == 0)
                    break outer;
                minGallop--;
            } while (count1 >= MIN_GALLOP | count2 >= MIN_GALLOP);
            if (minGallop < 0)
                minGallop = 0;
            minGallop += 2;  // Penalize for leaving gallop mode
        }  // End of "outer" loop
        this.minGallop = minGallop < 1 ? 1 : minGallop;  // Write back to field

        if (len2 == 1) {
            assert len1 > 0;
            dest -= len1;
            cursor1 -= len1;
            System.arraycopy(a, cursor1 + 1, a, dest + 1, len1);
            a[dest] = tmp[cursor2];  // Move first elt of run2 to front of merge
        } else if (len2 == 0) {
            throw new IllegalArgumentException(
                "Comparison method violates its general contract!");
        } else {
            assert len1 == 0;
            assert len2 > 0;
            System.arraycopy(tmp, tmpBase, a, dest - (len2 - 1), len2);
        }
    }

mergeForceCollapse强制归并的方法

    private void mergeForceCollapse() {
        while (stackSize > 1) {
            int n = stackSize - 2;
            if (n > 0 && runLen[n - 1] < runLen[n + 1])
                n--;
            mergeAt(n);
        }
    }

至此所有TimSort的排序流程都全部已经讲述完成了，对于我来说了解理论觉得它是神秘的，了解它的实现觉得它是可叹的，再看它的理论觉得豁然开朗和欣喜无比。

长白山西红柿

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Timsort——Java源码阅读记录

在工作的时候用到了对泛型集合排序，本来以为里面也是像排序数组一样使用双轴快排，但是跟着Collections.sort()源码里发现是用了TimSort.sort()的排序，去网上简单的搜索了一下，说TimSort用了归并排序，并极大程度的利用了自然界很多数都已经拍好序了这个规律，其中比较好的文章是：世界上最快的排序算法——Timsort，我在看完源码之后虽然看"懂"了，TimSort它是怎么做的，但是不清楚它为什么这么做，这篇文章就解答了我很多疑惑。为了帮助大家更好的理解TimSort，也...
复制链接

扫一扫