Timsort排序算法

王野也不野

已于 2022-06-04 18:59:53 修改

阅读量996

点赞数 1

分类专栏： java基础文章标签： java

于 2022-04-14 09:48:06 首次发布

原文链接：https://zhuanlan.zhihu.com/p/374041792

版权

java基础专栏收录该内容

23 篇文章 3 订阅

订阅专栏

Timsort排序算法

引子

我们日常使用Java的时候，经常会对数组使用一个操作叫**排序。**写到这里作者表示好奇，Java针对数组的底层实现是什么呢？我们来看一看吧。

int[] a;
Object[] b;
Arrays.sort(a);
Arrays.sort(b);

这里面我们看源码会发现，其实他们底层是用了两种不同的算法：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-h0uYlBdf-1649900789132)(F:%5CmarkdownImg%5Cv2-792d5a226c4f6d46f798e1b38f5c08d6_720w.jpg)]针对int[]的排序操作

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AvopaPvn-1649900789133)(F:%5CmarkdownImg%5Cv2-89127c2dbe54a7f5bb43921072f47e2f_720w.jpg)]针对Object[]的排序操作

可见针对int[]数组使用的是DualPivotQuicksort，而针对Object[]数组默认使用了Timsort（当然你也可以自己设置成MergeSort归并排序。）如果你点开ComparableTimSort这个类，会发现上面写了它只是Timsort这个类的没有Comparator比较器的重复实现。因此我们这里聚焦Timsort这个类的实现，也就能说明一般情况了。

这篇文章就来探究一下，这里的Timsort，也就是Timsort这个算法的实现原理。

什么是Timsort？原理是什么？

Timsort是一种混合稳定的排序算法，源自合并排序和插入排序，旨在较好地处理真实世界中各种各样的数据。它使用了Peter Mcllroy的"乐观排序和信息理论上复杂性"中的技术，参见第四届年度ACM-SIAM离散算法研讨会论文集，第467-474页，1993年。它由 Tim Peters 在2002年实现，并应用于Python编程语言。该算法通过查找已经排好序的数据子序列，在此基础上对剩余部分更有效地排序。该算法通过不断地将特定子序列（称为一个 run ）与现有的 run 合并，直到满足某些条件为止来达成的更有效的排序。从 2.3 版本起，Timsort 一直是 Python 的标准排序算法。它还被Java SE7[4],Android platform[5],GNU Octave,[6]谷歌浏览器,[7]和Swift[8]用于对非原始类型的数组排序。来源于维基百科。

一般来说，排序算法的测试都是基于随机排列的数据进行测试的，但实际上在现实场景中，我们几乎看不到随机数据。往往在一组需要排序的数中，数是局部有序的。例如 1 10 9 8 2 3 5 6 4 7 这样的排列中，我们可以拆分成{1, 10}，{9, 8}，{2, 3, 5, 6}，{4, 7}这几组局部有序的分组。Timsort就是利用了现实世界中这样的数据分布特点，利用了归并排序和插入排序进行运算。本质上是一种经过优化的归并排序。

Timsort的逻辑，简单的讲就以下几步。设整个序列长度为n。

定义整数值minRun，使得
刚好是2的整数次幂或比某个2的整数次幂稍小一点的数。最小的有序分组长度。假设最终排序是升序。
从左到右遍历，找到一个严格降序/升序的子序列，记长度为runLen，若runLen < minRun，则用插入排序将后面的minRun - runLen个数扩充，使得minRun长度子序列严格降序/升序。若为严格降序，再反转此序列（保证升序）。记每一个生成的子序列为run。
每生成一次run，考虑合并run与之前的run（之前的run可能是合并过的）。我们记最右侧的三个run的长度为A、B、C。此时合并A、B或B、C，使得合并后的新的最右侧三个满足A > B + C， B > C。若满足条件则不需要合并。

我们拿例子看一下，还是上文的 1 10 9 8 2 3 5 6 4 7 这个例子。注意这里只是算法流程的描述，其中用到的变量在实际使用中数值不是这么小。下文会有细节描述。

首先定义minRun = 3。（实际不会这么小）
找到第一个run1 = {1, 10}，runLen < minRun，向后找到3 - 2 = 1个元素{9}。
针对{9} 插入排序到run1中，得到{1, 9, 10}，不需要合并。
找到下一个run2{8, 2}，插入{3}，得到{8, 3, 2}，反转为{2, 3, 8}，不需要合并。
找到下一个run3{5, 6}，插入{4}，得到{4, 5, 6}，此时A = B = C = 3，合并A、B可以满足条件，因此合并run1，run2，合并后有run1 = {1, 2, 3, 8, 9, 10}，run3 = {4, 5, 6}
找到下一个run4{7}，此时A = 6，B = 3，C = 1，满足条件，不需要合并。
从右向左合并获得最终结果，即先合并run4、run3，然后再合并到run1，最终得到{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

Run细节

run的含义是一串“不降序“或”降序“的序列。也就是：

a0 <= a1 <= a2 <= … 或

a0 > a1 > a2 > …

显然run的最小长度为2。

此处定义严格降序的目的在于，在算法运行时会将严格降序的序列翻转。这里反转的方式就是用双指针首尾交换就可以了。因此保证翻转后的序列仍满足上述条件。

minrun计算细节

在Python原始描述，设置阈值为64，而Java则设置32，也就是说当序列长度n < 32时，直接设置minRun = n，也就是说不需要合并，直接在minRun中确定第一个有序数列，然后针对后面的数进行二分插入排序。

针对n > 32的情况，首先考虑两种情况。

当N是2的整数次幂时，一般选择32。

When N is a power of 2, testing on random data showed that minrun values of
16, 32, 64 and 128 worked about equally well. At 256 the data-movement cost
in binary insertion sort clearly hurt, and at 8 the increase in the number
of function calls clearly hurt. Picking some power of 2 is important
here, so that the merges end up perfectly balanced (see next section). We
pick 32 as a good value in the sweet range; picking a value at the low end
allows the adaptive gimmicks more opportunity to exploit shorter natural
runs.

也就是说考虑到run内部使用二分插入排序的性能，选256会增加数据移动的耗时，而选8会增加方法调用过程的耗时，因此选择32比较靠谱。

当然只选择32满足不了当N不是2的整数次幂的情况。这种情况下考虑定义整数值minRun，使得 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KQB5Rtdy-1649900789134)(F:%5CmarkdownImg%5Cequation)] 刚好是2的整数次幂或比某个2的整数次幂稍小一点的数。原因在于避免最后一个分组长度r过小的情况，导致合并流程出现长短的问题。比如当选择N = 2112，minrun = 32时，最后会出现2048 和 64去合并的情况。

What we want to avoid is picking minrun such that in

q, r = divmod(N, minrun)

q is a power of 2 and r>0 (then the last merge only gets r elements into
place, and r < minrun is small compared to N), or q a little larger than a
power of 2 regardless of r (then we’ve got a case similar to “2112”, again
leaving too little work for the last merge to do).

Instead we pick a minrun in range(32, 65) such that N/minrun is exactly a
power of 2, or if that isn’t possible, is close to, but strictly less than,
a power of 2. This is easier to do than it may sound: take the first 6
bits of N, and add 1 if any of the remaining bits are set. In fact, that
rule covers every case in this section, including small N and exact powers
of 2; merge_compute_minrun() is a deceptively simple function.

为什么要这样设定minRun？在算法最后合并最终结果时，是按照从右向左的方式合并的。这就意味着每次合并的时候考虑序列长度差距尽可能小。例如理想情况是这样，每个序列长度为：16，8，5，2，2。这样从右向左依次合并后，显然每次合并的两个序列长度是差不多的。（先合并2 2，然后合并5 4，合并8 9，合并16 17）。这样设定minRun，其实在理想情况下就可以将n长度的序列拆分成2的整数次幂个大小的长度序列，当然在拆分过程中会触发run之间的合并，例如拆分出了三个相似大小的长度的序列3 4 2，由于需要满足约束A > B + C 且B > C，会触发3 4的合并。因此最终在合并时就会出现上文的这样的排列。

Merge细节-约束和额外空间

插入排序一般使用binary insertion sort，即在确定插入位置时使用二分查找，减少查找次数。

回过头来看合并的约束

A > B + C
B > C

Note that, by induction, #2 implies the lengths of pending runs form a
decreasing sequence. #1 implies that, reading the lengths right to left,
the pending-run lengths grow at least as fast as the Fibonacci numbers.
Therefore the stack can never grow larger than about log_base_phi(N) entries,
where phi = (1+sqrt(5))/2 ~= 1.618. Thus a small # of stack slots suffice
for very large arrays.

也就是说这样的约束保证了从左到右降序，且从右向左合并的长度增长至少和斐波那契数列速度相同。

当A <= B + C时，选择A和C中较小的数与B合并。当A > B + C 且 B <= C时，显然合并B C。

假设合并A、B，**这里需要一点合并算法的基础。**合并时不开辟额外空间是一种比较麻烦的做法，而且耗时。因此合并时开辟一串数组空间temp，空间长度 = min(ALen, BLen)，若A较小，将A复制到temp中，然后合并。这里显然假设A、B是按顺序的，即A在B左侧，此时若A较小，就merge_lo，因为A复制后低ALen位置空出来了，从左到右低位合并即可。若B较小，就merge_hi，从右向左高位合并。

Merge细节-Galloping

一般来说我们在合并前不清楚数据的分布是“均匀的”还是“聚集的“。但好处是许多情况的“聚集”数据都可以自己暴露出来。换句话说，考虑如下两个例子：

{1, 2, 3, 9, 10} {4, 5, 6, 7, 8} 聚集的
{1, 3, 5, 7, 9} {2, 4, 6, 8, 10} 均匀的

一般的合并过程就是拿出A[i]，B[j]，然后比较找到其中较小的一个，放到合并后的位置上。我们将这一次比较的过程叫“one pair at a time“，也就是“一次比较一对“的意思。

有一种有趣情况是，当数据出现“聚集”时，我们会发现A[i]在每次比较中都“胜出”，也就是每次都会将B[j]中的数放入合并位置，比如当前A[i] = 20，而B[j]开始的后续序列为 11，12，13，14，15，… 21，那么只有A[i]和21相比较时，才会被合并。21前的数据就是一个聚集。我们通过设定一个阈值MIN_GALLOP，也就是说当某一个元素胜出次数达到MIN_GALLOP时，我们转换为Galloping流程。Galloping就是每次不拿出一个数，而是找到一个串（chunk），然后放到合并位置，接下来再到另一个序列中找到另一串，放入合并位置。Galloping找到的串大小必须不小于MIN_GALLOP，否则就退出回one-pair-at-a-time模式。

Merging begins in the usual, obvious way, comparing the first element of A
to the first of B, and moving B[0] to the merge area if it’s less than A[0],
else moving A[0] to the merge area. Call that the “one pair at a time”
mode. The only twist here is keeping track of how many times in a row “the
winner” comes from the same run.

If that count reaches MIN_GALLOP, we switch to “galloping mode”. Here
we search B for where A[0] belongs, and move over all the B’s before
that point in one chunk to the merge area, then move A[0] to the merge
area. Then we search A for where B[0] belongs, and similarly move a
slice of A in one chunk. Then back to searching B for where A[0] belongs,
etc. We stay in galloping mode until both searches find slices to copy
less than MIN_GALLOP elements long, at which point we go back to one-pair-
at-a-time mode.

这里我们确定A[i]的位置时采用二分查找的方式。

在实际运行中存在两个问题，会导致Galloping不符合预期。