Python 版 SampleSort 算法

最新推荐文章于 2022-10-23 13:20:15 发布

JaneJ2013

最新推荐文章于 2022-10-23 13:20:15 发布

阅读量603

点赞数

文章标签： python 排序算法

本文链接：https://blog.csdn.net/u012324798/article/details/115047539

版权

Python 版 SampleSort 算法

1 总结
2 源码分析

根据 2.2.3 版本 Cpython 的 listobject.c 源码翻译的 Python 版 SampleSort 代码。

1 总结

检查数组是否属于特殊案例（已经有序、或全部重复、或在有序数组末尾追加几个随机元素的数组）。如果只有很少的无序元素，或者总共只有很少的元素，直接使用二分插入排序就能以很低的成本结束整个排序。
普通案例（没有明显模式的大型数组），会得到完整的 samplesort 处理。它基本上就是加了兴奋剂的快速排序：
随机选择 2**k - 1 个元素作为预选枢轴（PP：preselected pivots），其中 k 等于最接近 n/ln(n) 的 2 幂次方，再减去 1 。把 PP 都交换到数组的最前面，并排好序。要实现的效果就是使用这个 2**k - 1 个 PP 把数组划分成 2**k 个区间。使得每个 PP 左边区间的元素都 < 该PP，右边的元素都 >= 该 PP。
进行分区操作之前，先从多个 PP 中选择位于中间的那个作为本次分区操作的枢轴。把本次枢轴左边的 PP 移动到当前区间的左端，然后把本次枢轴右边的 PP 移动到当前区间的右端，之后就可以忽略这些PP，基于本次枢轴对中间部分进行快速排序的分区操作。
本次分区完成后，先把较长的子区间入栈，然后把较短的子区间作为新的区间，进入下轮处理循环。
每轮处理循环中，如果区间元素数量很少，则没有必要进行分区操作，直接使用二分插入排序。如果没有可用的 PP，则无法进行分区操作，必须使用其他方法（根据待区间元素数量决定，递归调用 samplesort 或者二分插入排序）。如果不是上述两种情况，就可以进行分区操作，方法同上。

2 源码分析

2.1 SamplesortStackNode

Represents a slice of the array, from (& including) lo up to (but excluding) hi. “extra” additional & adjacent elements are pre-selected pivots (PPs), spanning [lo-extra, lo) if extra > 0, or [hi, hi-extra) if extra < 0. The PPs are already sorted, but nothing is known about the other elements in [lo, hi). |extra| is always one less than a power of 2. When extra is 0, we’re out of PPs, and the slice must be sorted by some other means.

堆栈
用于保存待排序的区间，它是虚拟的堆栈，通过 3 个变量来定义：

lo：待排序区间的开始下标。
hi：待排序区间的结束下标。
extra：PP 的下标范围偏移量。 extra > 0 表示 PP 的下标范围是 [lo-extra, lo) ；extra < 0 表示 PP 的下标范围是 [hi, hi-extra)； extra = 0 表示已经用完了 PP，必须通过其他方法对该区间进行排序。

class SamplesortStackNode:
    def __init__(self, stack_size):
        self.lo = [0] * stack_size
        self.hi = [0] * stack_size
        self.extra = [0] * stack_size

2.2 SampleSort 类属性

类属性初始化了各项阈值，以及用于计算 PP 数量的工具。

class SampleSort:
    MINSIZE = 100
    MINPARTITIONSIZE = 40
    MAXMERGE = 15
    STACKSIZE = 60
    CUTOFFBASE = 4
    cutoff = [
        43,        
        106,       
        250,
        576,
        1298,
        2885,
        6339,
        13805,
        29843,
        64116,
        137030,
        291554,
        617916,
        1305130,
        2748295,
        5771662,
        12091672,
        25276798,
        52734615,
        109820537,
        228324027,
        473977813,
        982548444,   
        2034159050   
    ]

MINSIZE = 100

MINSIZE is the smallest array that will get a full-blown samplesort treatment; smaller arrays are sorted using binary insertion. It must be at least 7 for the samplesort implementation to work. Binary insertion does fewer compares, but can suffer O(N**2) data movement. The more expensive compares, the larger MINSIZE should be.

最小的数组长度，小于此长度的数组直接使用二分插入排序，满足此长度的数据能够得到完整的 samplesort 处理。为了使 samplesort 的处理逻辑能够工作，MINSIZE 必须至少设置为 7。二分插入排序的比较次数更少，但需要忍受 O(N**2) 的数据移动次数。因此，比较的成本越高时，MINSIZE 就应该设置的越大。

MINPARTITIONSIZE = 40

MINPARTITIONSIZE is the smallest array slice samplesort will bother to partition; smaller slices are passed to binarysort. It must be at least 2, and no larger than MINSIZE. Setting it higher reduces the # of compares slowly, but increases the amount of data movement quickly. The value here was chosen assuming a compare costs ~25x more than swapping a pair of memory-resident pointers – but under that assumption, changing the value by a few dozen more or less has aggregate effect under 1%. So the value is crucial, but not touchy .

最小区间长度，小于此长度的区间直接使用二分插入排序，满足此长度的区间将继续进行分区操作。它必须至少是 2，并不大于 MINSIZE。将其设置得更高可以缓慢地减少比较次数，但是将快速地增加数据移动次数。这里选择的值假设比较成本比交换一对驻留内存的指针高出25倍 —— 但是在这种假设下，将这个值加减几十的总体影响效果小于1% 。所以这个值是至关重要的，但不是敏感的。

MAXMERGE = 15

MAXMERGE is the largest number of elements we’ll always merge into a known-to-be sorted chunk via binary insertion, regardless of the size of that chunk. Given a chunk of N sorted elements, and a group of K unknowns, the largest K for which it’s better to do insertion (than a full-blown sort) is a complicated function of N and K mostly involving the expected number of compares and data moves under each approach, and the relative cost of those operations on a specific architecure. The fixed value here is conservative, and should be a clear win regardless of architecture or N.

最大无序元素数量，如果检查发现无序元素数量低于 MAXMERGE，则使用二分插入排序将无序元素合并到一个已知的有序区中，与有序区的大小无关。对于有序区长度 N 和无序元素数量 K，存在这样一个最大的 K 值，它使得采用二分插入排序做合并的效率总是优于使用完整的 samplesort 做合并的效率。这个最大的 K 值是 N 和 K 的一个复杂函数，主要涉及两种算法下的预期比较数和数据移动数，以及这些操作在特定架构上运行的相对成本。此处设置的固定值是保守的，并且应该是一个明显的胜利，无论在哪种架构上运行，也无论有序区的大小 N 是多少。

STACKSIZE = 60

STACKSIZE is the size of our work stack. A rough estimate is that this allows us to sort arrays of size N where N / ln(N) = MINPARTITIONSIZE * 2STACKSIZE, so 60 is more than enough for arrays of size 264. Because we push the biggest partition first, the worst case occurs when all subarrays are always partitioned exactly in two.

工作堆栈的大小。一个粗略的估计公式，STACKSIZE 允许我们对大小为 N 的数组排序，其中 N / ln(N) = MINPARTITIONSIZE * 2**STACKSIZE，所以 60 对于大小为2**64 的数组来说已经足够了。因为我们首先入栈的是最大的区间，所以当所有子数组总是精确地分成两部分时，最糟糕的情况才会发生。

用于计算 PP 数量的工具

CUTOFFBASE = 4
cutoff = [
     43,        # smallest N such that k == 4
     106,       # etc
     250,
     576,
     ...
 	 982548444,   # smallest N such that k == 26
     2034159050   # largest N that fits in signed 32-bit; k == 27
    ]

The number of PPs we want is 2k - 1, where 2k is as close to N / ln(N) as possible. So k ~= lg(N / ln(N)). Calling libm routines is undesirable, so cutoff values are canned in the “cutoff” table below: cutoff[i] is the smallest N such that k == CUTOFFBASE + i.

我们需要的 PP 数量是 2**k - 1，其中 2**k 尽可能接近 N / ln(N)，N 是待排序的元素个数。在 cutoff 列表中保存 k 对应的 N: cutoff[i] 是最小的 N，使得 k == CUTOFFBASE + i。

待排序的元素个数 N	PP 数量 = 2 ** k - 1	分区数量 = 2 ** k
[0,42]	7	2 ** 3
[43, 105]	15	2 ** 4
[106, 249]	31	2 ** 5
[250,575]	63	2 ** 6

2.3 binary_sort

binarysort is the best method for sorting small arrays: it does few compares, but can do data movement quadratic in the number of elements. [lo, hi) is a contiguous slice of a list, and is sorted via binary insertion. On entry, must have lo <= start <= hi, and that [lo, start) is already sorted (pass start == lo if you don’t know!). If docompare complains (returns CMPERROR) return -1, else 0. Even in case of error, the output slice will be some permutation of the input (nothing is lost or duplicated).

binary_sort 方法是二分插入排序算法的实现，它是对小数组进行排序的最佳方法：它的比较次数很少，但数据移动次数能达到元素数量的二次方。[ lo，hi) 是列表的一个连续区间，通过二分插入排序算法进行排序。每次调用时，必须满足 lo < = start < = hi，并且 [lo, start) 已经是有序的（如果你不知道，则传参 start == lo ！）。即使在出现错误的情况下，输出区间也将是输入的某种排列（不会丢失或重复）。

    def binary_sort(self, arr, lo, hi, start):
        """
        对数组的指定区间进行二分插入排序，通过 start 参数能跳过已知有序区，直接从无序区开始排序。
        :param arr: 待排序数组
        :param lo: 待排序区间的第一个元素下标, 包含该元素
        :param hi: 待排序区间的最后一个元素下标, 不包含该元素
        :param start: 已知无序区的第一个元素下标
        :return: 无
        """
        assert lo <= start <= hi

        if lo == start:
            start += 1

        while start < hi:
            l = lo
            r = start
            pivot = arr[r]

            while True:
                p = l + ((r - l) >> 1)
                if pivot < arr[p]:
                    r = p
                else:
                    l = p + 1

                if l >= r:
                    break

            for p in range(start, l, -1):
                arr[p] = arr[p-1]
            arr[l] = pivot

            start += 1

        return arr

2.4 sample_sort_slice

samplesortslice is the sorting workhorse. [lo, hi) is a contiguous slice of a list, to be sorted in place. On entry, must have lo <= hi, If docompare complains (returns CMPERROR) return -1, else 0. Even in case of error, the output slice will be some permutation of the input (nothing is lost or duplicated).

sample_sort_slice 方法是入口，是排序的主力。[lo, hi) 是数组的一个连续区间，要对这个区间就地排序。对于待排序区间，必须有 lo < = hi。即使在出现错误的情况下，输出区间也将是输入的某种排列(不会丢失或重复)。

2.4.1 处理特殊案例

Set r to the largest value such that [lo,r) is sorted. This catches the already-sorted case, the all-the-same case, and the appended-a-few-elements-to-a-sorted-list case. If the array is unsorted, we’re very likely to get out of the loop fast, so the test is cheap if it doesn’t pay off.

检查数组是否属于特殊案例（已经有序、或全部重复、或在有序数组末尾追加几个随机元素的数组）。如果只有很少的无序元素，或者总共只有很少的元素，直接使用二分插入排序就能以很低的成本结束整个排序。如果数组是随机的，则很可能很快就退出了检查循环，因此即使没有什么回报，检查的成本也很低。

    def sample_sort_slice(self, arr, lo, hi):
        # 初始化堆栈，用于保存待排序的分区
        stack = SamplesortStackNode(self.STACKSIZE)

        assert lo <= hi
        n = hi - lo

        """
         * ----------------------------------------------------------
         * 特殊案例：已经有序、全部重复、在有序数组末尾追加几个随机元素。
         * --------------------------------------------------------*
        """
        if n < 2:
            return arr
   
        # 检查数组是否为升序或完全重复
        assert lo < hi
        r = lo+1
        for r in range(lo+1, hi):
            if arr[r] < arr[r-1]:
                # 发现降序则退出检测循环
                break

        # [lo,r) 升序或完全重复，[r，hi) 无序. 如果只有很少的无序元素，或者总共只有很少的元素，直接调用二分插入排序。
        if hi - r <= self.MAXMERGE or n < self.MINSIZE:
            return self.binary_sort(arr, lo, hi, start=r)

        # 检查数组是否降序
        assert lo < hi
        for r in range(lo+1, hi):
            if arr[r-1] < arr[r]:
                # 发现升序则退出检测循环
                break

        # [lo,r) 降序，[r，hi) 无序. 如果只有很少的无序元素，反转前面的降序部分，对后面的无序部分做二分插入排序。
        if hi - r <= self.MAXMERGE:
            originalr = r
            l = lo
            while True:
                r -= 1
                arr[l], arr[r] = arr[r], arr[l]
                l += 1
                if l >= r:
                    break
            return self.binary_sort(arr, lo, hi, start=originalr)

2.4.2 处理普通案例

samplesort is basically quicksort on steroids: a power of 2 close to n/ln(n) is computed, and that many elements (less 1) are picked at random from the array and sorted. These 2**k - 1 elements are then used as preselected pivots for an equal number of quicksort partitioning steps, partitioning the slice into 2**k chunks each of size about ln(n). These small final chunks are then usually handled by binarysort. Note that when k=1, this is roughly the same as an ordinary quicksort using a random pivot, and when k=2 this is roughly a median-of-3 quicksort. From that view, using k ~= lg(n/ln(n)) makes this a “median of n/ln(n)” quicksort. You can also view it as a kind of bucket sort, where 2**k-1 bucket boundaries are picked dynamically.

普通案例（没有明显模式的大型数组），会得到完整的 samplesort 处理。它基本上就是加了兴奋剂的快速排序：

随机选择 2**k - 1 个元素作为预选枢轴（PP：preselected pivots），其中 k 等于最接近 n/ln(n) 的 2 幂次方，再减去 1 。把 PP 都交换到数组的最前面，并排好序。
进行快速排序的分区操作，使用这个 2**k - 1 个 PP 把数组划分成 2**k 个区间。使得每个 PP 左边区间的元素都 < 该PP，右边的元素都 >= 该 PP。
分别处理这些子区间：如果元素数量很少，则直接使用二分插入排序；如果元素数量较多，则递归调用 samplesort。

当 k = 1 时，只有 1 个 PP，数组被划分成 2 个区间，再对这些子区间递归。此时 samplesort 与使用随机数轴的普通快速排序基本相同。当 k = 2 时，有 3 个 PP，数组被划分成 4 个区间，再对这些子区间递归。此时 samplesort 有点类似三位取中（median-of-3）版本的快速排序。

从这个观点出发，samplesort 算法可以视为一个“ n/ln(n) 取中”版本的快速排序。还可以将其视为一种桶排序，把数组元素分入 2**k 个桶，而桶边界（PP）是动态选择的。

The large number of samples makes a quadratic-time case almost impossible, and asymptotically drives the average-case number of compares from quicksort’s 2 N ln N (or 12/7 N ln N for the median-of-3 variant) down to N lg N.

由于样本数量（PP数量）比较大，使得二次方时间的情形几乎不可能存在，并且渐近地使得比较的平均次数从普通快速排序的 2 N ln N（或三位取中版的 12/7 N ln N）下降到 N lg N。

2.4.2.1 确定 PP

计算所需 PP 数量，（伪）随机选择出 PP，将它们交换到数组的前端，对这些 PP 递归排序。

        """
         * ----------------------------------------------------------
         * 普通案例: 没有明显模式的大型数组。
         * --------------------------------------------------------
        """
        # 使用变量 extra 计算所需 PP 数量: 2 ** k - 1
        # 先根据元素数量 n 查找 cutoff 列表，此时 extra 是 cutoff 的下标
        # 注意，如果我们没有 break，而是遍历完了整个循环，extra 的值仍然有意义，但是可能比我们希望的要小(在这种情况下，数组有超过 2**31 个元素!)
        extra = 0
        for extra in range(0, len(self.cutoff)):
            if n < self.cutoff[extra]:
                break
                
        # 为了使 samplesort 的处理逻辑能够工作，`MINSIZE` 必须至少设置为 7
        assert self.MINSIZE >= 2 ** (self.CUTOFFBASE-1) - 1
        
        # [0,42] -> cutoff下标 = 0 -> extra = 2 ** 3 - 1 = 7
        # [43, 105]  -> cutoff下标 = 1 -> extra = 2 ** 4 - 1 = 15
		# 以此类推
        extra = (1 << (extra - 1 + self.CUTOFFBASE)) - 1
        assert (extra > 0) and (n >= extra)

        # 随机选择 extra 个 PP，将它们交换到数组的前端。选择过程是伪随机的，相同的数组多次运行的选择不变（这是有意设计的！如果每次对相同的数组运行算法，其计时都变化，是很痛苦的)。
        seed = n // extra 
        for i in range(0, extra):
            seed = seed * 69069 + 7
            # 在数组下标范围 [i, n) 中随机选择一个下标 j，将该元素交换到数组的前端。
            j = int(i + seed % (n - i))
            arr[lo+i], arr[lo+j] = arr[lo+j], arr[lo+i]  

        # 对这些 PP 递归排序
        self.sample_sort_slice(arr, lo, lo + extra)

2.4.2.1 分区操作

每轮循环处理一个区间：

如果待排序元素个数 n 很小，则没有必要进行分区操作，直接使用二分插入排序。
如果没有可用的 PP，无法进行分区操作，必须使用其他方法（根据待排序元素个数决定，递归调用 samplesort 或者二分插入排序）。
如果不是上述两种情况，就可以进行分区操作，先从多个 PP 中选择位于中间的那个作为本次分区操作的枢轴。把本次枢轴左边的 PP 移动到当前区间的左端，然后把本次枢轴右边的 PP 移动到当前区间的右端。
然后基于本次枢轴进行分区。注意：一般快速排序的左右指针都会跳过 = pivot 的元素，最后实现的效果是，左边的元素 <= pivot <= 右边的元素，导致 = pivot 的元素分散在了两个子区间中。而 samplesort 的左指针不会跳过 = pivot 的元素，只有右指针会跳过，最后实现的效果是，左子区间的元素 < pivot <= 右子区间的元素，从而 = pivot 的元素都集中在右子区间中。如果右子区间全都是、或前一部分是 = pivot 的元素，可以通过一个检查循环，跳过它们。
分区完成后，先把较长的子区间入栈，然后把较短的子区间作为新的区间，进入下轮处理循环。

        top = 0         # 堆栈栈顶指针
        lo += extra      # 让 lo 指向第一个无序元素
        extraOnRight = 0  # extraOnRight = 0 表示此时可用 PP 都在待排序区间的左端，= 1 表示在右端

        """
        /* ----------------------------------------------------------
         * 对 [lo, hi) 进行分区操作，重复此步骤，直到没有可处理的区间
         * --------------------------------------------------------*/
        """
        while True:
            assert lo <= hi  
            n = hi - lo
            
            """
            如果待排序元素个数 n 很小，则没有必要进行分区操作，直接使用二分插入排序。
            如果没有可用的 PP（extra == 0），无法进行分区操作，必须使用其他方法。
            """
            if n < self.MINPARTITIONSIZE or extra == 0:
            	# 有一种情况很少见，但是也有可能发生：随机选择出的 PP 刚好都是数组中很小或很大的值，导致划分出了一个很大的子区间和多个很小甚至为空的子区间。对于那个很大的子区间，已经没有可用的 PP 了，此时应对其递归调用 samplesort 排序，再选出新的 PP。
                if n >= self.MINSIZE:
                    assert extra == 0
                    self.sample_sort_slice(arr, lo, hi)
                else:
                    # 只要不是上述情况，使用二分插入排序应该更快，并且我们可以利用已经有序的 PP。
                    # 如果还有未使用的 PP，并且位于区间右端，把它们都交换到左端
                    if extraOnRight and extra:
                        k = extra
                        while True:
                            arr[lo], arr[hi] = arr[hi], arr[lo]
                            lo += 1
                            hi += 1
                            k -= 1
                            if k <= 0:
                                break
                    self.binary_sort(arr, lo-extra, hi, start=lo)

                # 处理完当前区间，再从堆栈中弹出一个区间，使其进入下轮处理循环
                top -= 1
                if top < 0:
                    break   # 没有要处理的了
                
                # 新区间的开始下标    
                lo = stack.lo[top]  
                # 新区间的结束下标
                hi = stack.hi[top] 
                # 新区间中可用 PP 的下标范围偏移量。 `extra > 0` 表示 PP 的下标范围是 `[lo-extra, lo)` ；`extra < 0` 表示 PP 的下标范围是 `[hi, hi-extra)`； `extra = 0` 表示已经用完了 PP，必须通过其他方法对该区间进行排序。
                extra = stack.extra[top] 
                extraOnRight = 0
                if extra < 0:
                    extraOnRight = 1
                    extra = -extra

				# 让新区间进入下轮处理循环
                continue

            """
            从多个 PP 中选择位于中间的那个作为本次分区操作的枢轴。
            例如：这些 PP 的下标为 `0, 1, ..., extra-1`，则选择下标为 `(extra-1)/2` 的那个 PP 作为本次分区操作的枢轴。
            然后，把本次枢轴左边的 PP 移动到当前区间的左端，然后把本次枢轴右边的 PP 移动到当前区间的右端。
            移动完成后，区间的状态：`[lo-extra, lo)` 都是较小的 PP，`arr[lo]` 是本次枢轴，`(lo, hi)` 都是无序元素，`[hi, hi+extra)`都是较大的 PP。            
            """
            extra >>= 1  # 需要移动的 PP 数量，extra = extra // 2
            k = extra
            if extraOnRight:
                # 如果 PP 都在区间右端，则只需把较小的 PP 交换到区间的左端
                # 注意，这个循环实际上移动了 k+1 个元素，最后一个是本次分区操作的枢轴
                while True:
                    arr[lo], arr[hi] = arr[hi], arr[lo]
                    lo += 1
                    hi += 1
                    k -= 1
                    if k < 0:
                        break
            else:
                # 如果 PP 都在区间左端，则只需把较大的 PP 交换到区间的右端，实际上移动了 k 个元素
                while k > 0:
                    lo -= 1
                    hi -= 1
                    arr[lo], arr[hi] = arr[hi], arr[lo]
                    k -= 1

            lo -= 1   # 现在 arr[lo] 就是本次分区操作的枢轴
            pivot = arr[lo]

            """
            接下来是快速排序的分区操作。
            注意：一般快速排序的左右指针都会跳过 = pivot 的元素，最后实现的效果是，左边的元素 <= pivot <= 右边的元素，导致 = pivot 的元素分散在了两个子区间中。
            而 samplesort 的左指针不会跳过 = pivot 的元素，只有右指针会跳过，最后实现的效果是，左边的元素 < pivot <= 右边的元素，从而 = pivot 的元素都集中在右子区间中。如果右子区间全都是、或前一部分是 = pivot 的元素，可以通过一个检查循环，跳过它们。
            """
            # 快速排序分区操作（左右指针版）
            l = lo + 1	# 左指针初始化在无序区开始位置
            r = hi - 1	# 右指针初始化在无序区结束位置
            assert lo < l < r < hi 

            while True:
                # 左指针 l 向右走，寻找 >= pivot 的元素
                while True:
                    if arr[l] < pivot:
                        l += 1
                    else:
                        break
                    if l >= r:
                        break

                # 右指针 r 向左走，寻找 < pivot 的元素
                while l < r:
                    rval = arr[r]
                    r -= 1
                    if rval < pivot:
                    	# 把右指针 r 找到的小值与左指针 l 找到的大值（含等于）交换
                        arr[r+1] = arr[l]
                        arr[l] = rval
                        l += 1
                        break
                # 左右指针相遇时，就遍历完了整个无序区
                if l >= r:
                    break

            assert lo < r <= l < hi
            assert r == l or r+1 == l
            # 现在 l 左侧的元素都 < pivot，r 右侧的元素都 >= pivot

            if l == r:
                if arr[r] < pivot:
                    l += 1
                else:
                    r -= 1

            assert lo <= r and r+1 == l and l <= hi
            assert r == lo or arr[r] < pivot
            assert arr[lo] is pivot
            assert l == hi or arr[l] >= pivot
            
            # 把 pivot 交换到“中间”，以后就可以忽略它了。
            arr[lo] = arr[r]
            arr[r] = pivot

            # 此时以下都为真，并且会保持
            #    [lo,r) 范围内的元素都 < pivot
            #    [r,l) 范围内的元素都 == pivot (因此可以被忽略)
            #    [l,hi) 范围内的元素都 >= pivot

            # 检查右子区间 [l,hi) 是否全都是、或前一部分是 pivot 的重复值。如果没有重复，则这一次比较是浪费的，但如果有重复则可以获益很大。
            # 微秒之处：由于 [l,hi) 范围内的元素都 >= pivot，所以此时能间接推导相等。
            while l < hi:
                # 已知 pivot <= arr[l]
                if pivot < arr[l]:
                    break
                else:
                    # <= 且不是 < 意味着 ==
                    l += 1

            assert lo <= r < l <= hi
            # 分区操作得到了 [lo, r) 和 [l, hi) 两个子区间

			"""
            分区完成后，先把较长的子区间入栈，然后把较短的子区间作为新的区间，进入下轮处理循环
            """
            assert top < self.STACKSIZE
            if r - lo <= hi - l:
                # 如果右子区间更长，将其入栈，注意右子区间的右侧有未使用的 PP
                stack.lo[top] = l  
                stack.hi[top] = hi  
                stack.extra[top] = -extra
                # 把左子区间作为新的区间，进入下轮处理循环，注意左子区间的左侧有未使用的 PP
                hi = r
                extraOnRight = 0
            else:
                # 如果左子区间更长，将其入栈，注意左子区间的左侧有未使用的 PP
                stack.lo[top] = lo   
                stack.hi[top] = r  
                stack.extra[top] = extra
                # 把右子区间作为新的区间，进入下轮处理循环，注意右子区间的右侧还有未使用的 PP
                lo = l
                extraOnRight = 1

			# 更新栈顶指针
            top += 1

        return arr