算法 - 排序

最新推荐文章于 2024-03-12 17:47:05 发布

Ellenjing

最新推荐文章于 2024-03-12 17:47:05 发布

阅读量267

点赞数 1

分类专栏：算法文章标签：算法面试

本文链接：https://blog.csdn.net/Ellenjing/article/details/120589327

版权

算法专栏收录该内容

3 篇文章 0 订阅

订阅专栏

排序算法

最近在面试过程中发现很多人对排序算法了解不多，排序成了面试的重灾区之一。所以特此整理下自己的学习笔记，分享给大家。

1. 快速排序

分而治之：在快排中最重要的过程叫partition, 然后分别对左右两个区间进行递归进行快排

1.1 稳定性

不稳定

1.2 时间复杂度

快速排序最坏运行情况是顺序数列，时间复杂度为 O(n²)。但快排的平均时间复杂度是 O(nlogn)，且 O(nlogn) 中隐含的常数因子很小，比复杂度稳定等于 O(nlogn) 的归并排序要小很多。

对绝大多数顺序性较弱的随机数列而言, 快速排序总是优于归并排序

1.3 空间复杂度

快排的空间复杂度为递归栈的深度：O(logn)

1.4 代码

public static void quickSortV1(int[] arr, int l, int r) {
    if (l >= r) return;
    int head = l, tail = r, pivot = arr[l];
    while (head < tail) {
        while (tail > head && arr[tail] >= pivot) {
            tail--;
        }
        if (head < tail) {
            arr[head++] = arr[tail];
        }
        while (head < tail && arr[head] <= pivot) {
            head++;
        }
        if (head < tail) {
            arr[tail--] = arr[head];
        }
    }
    arr[head] = pivot;
    quickSortV1(arr, l, head - 1);
    quickSortV1(arr, head + 1, r);
}

2. 插入排序

适用场景：数据部分有序，且规模不宜过大

2.1 稳定性

稳定

2.2 时间复杂度

当元素有序时为最优： O(n)，只需要和前面的元素比较一下即可
当元素逆序时最差：O(n * n)
平均： O(n * n)
当用二分查找优化后为： O(nlogn), 但移动次数最差仍为O(n^2)

2.3 空间复杂度

常数阶 O(1)

2.4 代码

public static void insertSort(int[] arr) {
    for (int i = 1; i < arr.length; i++) {
        int temp = arr[i];
        int pre = i - 1;
        while (pre >= 0 && arr[pre] > temp) {
            arr[pre + 1] = arr[pre];
            pre--;
        }
        arr[++pre] = temp;
    }
}

优化后

public static void binaryInsertSort(int[] arr) {
    for (int i = 1; i < arr.length; i++) {
        int temp = arr[i];
        int pos = binarySearch(arr, 0, i - 1, temp);
        arr[pos] = temp;
    }
}

private static int binarySearch(int[] arr, int l, int r, int pivot) {
    if (l == r) {
        if (arr[l] <= pivot) return l + 1;
        return l;
    }
    int head = l, tail = r;
    while (head < tail) {
        int mid = (head + tail) >> 1;
        if (arr[mid] <= pivot) {
            head = mid + 1;
        } else {
            tail = mid;
        }
    }
    if (arr[head] <= pivot) return head + 1;
    int gap = r - head + 1;
    switch (gap) {
        case 2: {
            arr[head + 2] = arr[head + 1];
        }
        case 1: {
            arr[head + 1] = arr[head];
            break;
        }
        default:
            System.arraycopy(arr, head, arr, head + 1, r - head + 1);
    }
    return head;
}

3. 希尔排序

递减增量排序算法，是插入排序的一种更高效的改进版本。
希尔排序是基于插入排序的以下两点性质而提出改进方法的：
插入排序在对几乎已经排好序的数据操作时，效率高，即可以达到线性排序的效率；
但插入排序一般来说是低效的，因为插入排序每次只能将数据移动一位；
希尔排序的基本思想是：先将整个待排序的记录序列分割成为若干子序列分别进行直接插入排序，待整个序列中的记录"基本有序"时，再对全体记录进行依次直接插入排序。
适用场景：中等大小规模表现良好，对规模非常大的数据排序不是最优选择

3.1 稳定性

不稳定

3.2 时间复杂度

时间复杂度为O(n^(1.3 - 2))

3.3 空间复杂度

O(1)

3.4 代码

public static void shellSort(int[] arr) {
    for (int step = arr.length >> 1; step >= 1; step >>= 1) {
        for (int i = step; i < arr.length; i++) {
            int pre = i - step;
            int temp = arr[i];
            while (pre >= 0 && arr[pre] > temp) {
                arr[pre + step] = arr[pre];
                pre -= step;
            }
            arr[pre + step] = temp;
        }
    }
}

4. 堆排序

堆是一个近似完全二叉树的结构，并同时满足堆的性质：即子结点的键值或索引总是小于（或者大于）它的父节点。
大顶堆：每个节点的值都大于或等于其子节点的值，在堆排序算法中用于升序排列；
小顶堆：每个节点的值都小于或等于其子节点的值，在堆排序算法中用于降序排列；

4.1 稳定性

不稳定

4.2 时间复杂度

O(nlogn) 不会随着输入元素的变化而变化

4.3 空间复杂度

O(1)

4.4 代码

public static void heapSort(int[] arr) {
    int length = arr.length;
   
    for (int i = (length >> 1) - 1; i >= 0; i--) {
        //build heap
        heapify(arr, i, length);
    }
    while (length > 1) {
        //swap the two elements located in the top and tail of the heap
        swap(arr, 0, length - 1);
        length--;
        //build heap
        heapify(arr, 0, length);
    }
}

private static void heapify(int[] arr, int i, int length) {
    int ind = 2 * i + 1;
    while (ind < length) {//have child
        //ind point to the largest child
        if (ind + 1 < length && arr[ind + 1] > arr[ind]) ind++;
        if (arr[ind] > arr[i]) {
            swap(arr, i, ind);
        }
        i = ind;// parent pointer to the largest child
        ind = 2 * i + 1;//ind point to the left child
    }
}

private static void swap(int[] arr, int i, int j) {
    int temp = arr[i];
    arr[i] = arr[j];
    arr[j] = temp;
}

5. 归并排序

适用场景：大数据量

5.1 稳定性

稳定

5.2 时间复杂度

O(nlogn) ,时间复杂度稳定，不会随着输入元素不同而变化

5.3 空间复杂度

o(n) ，所以归并排序需要额外的空间。

5.4 代码

public static void mergeSort(int[] arr, int l, int r) {
    if (l >= r) return;
    int mid = (l + r) >> 1;
    mergeSort(arr, l, mid);
    mergeSort(arr, mid + 1, r);
    int[] temp = new int[r - l + 1];
    for (int i = l, j = mid + 1, ind = 0; i <= mid || j <= r; ind++) {
        if (j > r || i <= mid && arr[i] <= arr[j]) {
            temp[ind] = arr[i++];
        } else {
            temp[ind] = arr[j++];
        }
    }
    for (int i = l, ind = 0; i <= r; i++, ind++) {
        arr[i] = temp[ind];
    }
}

6. 计数排序

它的优势在于在对一定范围内的整数排序时，它的复杂度为Ο(n+k)（其中k是整数的范围），快于任何比较排序算法。
适用场景：元素值域范围有限

6.1 稳定性

稳定

6.2 时间复杂度

O(n+k)

6.3 空间复杂度

O(n+k)

6.4 LeetCode 1122 数组的相对排序

public int[] relativeSortArray(int[] arr1, int[] arr2) {
    int[] arr = new int[1005];
    for (int i = 0; i < arr1.length; i++) {
        arr[arr1[i]] += 1;//统计arr1中每个元素出现的次数
    }
    int ind = 0;
    for (int x : arr2) {
        while (arr[x]-- > 0) {
            arr1[ind++] = x;
        }
    }
    for (int i = 0; i < arr.length; i++) {
        for (int j = 0; j < arr[i]; j++) {
            arr1[ind++] = i;
        }
    }
    return arr1;
}

7. 基数排序

适用场景：元素范围超整型表示范围，浮点数等

7.1 稳定性

稳定

7.2 时间复杂度

线性

7.3 空间复杂度

O(n+k)

7.4 代码

private static final int SIZE = 65536;
private static final int HALF_SIZE = 1 << 15;

private int low16(int num) {
    return num & 0xffff;
}

private int high16(int num) {
    int result = (num & 0xffff0000) >>> 16;
    return result > HALF_SIZE - 1 ? result - HALF_SIZE : result + HALF_SIZE;
}

public void radix(int[] arr) {
    int[] cnt = new int[SIZE];
    int[] temp = new int[arr.length];
    //统计arr中低16位出现的次数到数组cnt中
    for (int i = 0; i < arr.length; i++) {
        cnt[low16(arr[i])] += 1;
    }
    //求前缀和
    for (int i = 1; i < SIZE; i++) {
        cnt[i] = cnt[i - 1] + cnt[i];
    }
    //对低16位排序归位
    for (int i = arr.length - 1; i >= 0; i--) {
        temp[--cnt[low16(arr[i])]] = arr[i];
    }
    //初始化cnt
    for (int i = 1; i < SIZE; i++) {
        cnt[i] = 0;
    }
    //统计arr中高16位出现的次数到数组cnt中
    for (int i = 0; i < arr.length; i++) {
        cnt[high16(temp[i])] += 1;
    }
    for (int i = 1; i < SIZE; i++) {
        cnt[i] = cnt[i - 1] + cnt[i];
    }
    for (int i = arr.length - 1; i >= 0; i--) {
        arr[--cnt[high16(temp[i])]] = temp[i];
    }
}

8. TimSort

8.1 为什么要了解TimSort?

Timsort 是一个经过大量优化的归并排序，而归并排序已经到达了最坏情况下，比较排序算法时间复杂度的下界，所以在最坏的情况下，Timsort 时间复杂度为 O(nlogn)。在最佳情况下，即输入已经排好序，它则以线性时间运行O(n)。可以看出Timsort是目前最好的排序方式。

8.2 从jdk的实现开始

在jdk中的util包中的List 接口、List接口的实现类ArrayList，以及工具类Collections中都有sort方法，他们都会调用util包中的Arrays中的sort方法，Arrays中的sort方法，会根据传入的参数中是否有比较器来判断使用TimSort或ComparableTimSort进行排序，这两种排序方法的过程相同。只是在排序过程中，TimeSort使用比较器接口中的compare方法比较元素大小，ComparableTimeSort排序的元素必须实现Comparable接口，所以可以使用元素自己的compareTo方法比较元素大小。基于以上分析，我们可以将注意力集中在TimSort方法上。
TimSort是一种混合的排序算法，内部使用了插入排序和归并排序，但分别对两种经典的排序进行了优化。
下面摘自于JDK中的类说明：

A stable, adaptive, iterative mergesort that requires far fewer than nlg(n) comparisons when running on partially sorted arrays, while offering performance comparable to a traditional mergesort when run on random arrays. Like all proper mergesorts, this sort is stable and runs O(nlogn) time (worst case). In the worst case, this sort requires temporary storage space for n/2 object references; in the best case, it requires only a small constant amount of space. This implementation was adapted from Tim Peters’s list sort for Python, which is described in detail here:
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
Tim’s C code may be found here:
http://svn.python.org/projects/python/trunk/Objects/listobject.c
The underlying techniques are described in this paper (and may have even earlier origins):
“Optimistic Sorting and Information Theoretic Complexity” - Peter McIlroy
SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp 467-474, Austin, Texas, 25-27 January 1993.
While the API to this class consists solely of static methods, it is (privately) instantiable; a TimSort instance holds the state of an ongoing sort, assuming the input array is large enough to warrant the full-blown TimSort. Small arrays are sorted in place, using a binary insertion sort.

翻译：
一种稳定的、自适应的迭代归并排序，当运行在部分排序的数组上时，需要的比较比nlg(n)要少得多，而当运行在随机数组上时，其性能可与传统的归并排序相媲美。与所有合适的归并排序一样，这种排序是稳定的，运行时间为O(nlogn)(最坏情况)。在最坏的情况下，这种排序需要n/2个对象引用的临时存储空间; 在最好的情况下，它只需要很小的常量空间。这个实现改编自Tim Peters的Python列表排序，详细描述如下:
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
Tim的C代码可以在这里找到:
http://svn.python.org/projects/python/trunk/Objects/listobject.c
下面论文描述了使用的底层技术(也可能有更早的起源):
乐观排序与信息理论复杂性 - 彼得McIlroy
第四届ACM-SIAM离散算法年会，第467-474页，德克萨斯州奥斯汀，1993年1月25-27日。
虽然这个类的API仅由一个静态方法组成，但它是可实例化的; TimSort实例对象保存正在进行的排序状态，假设输入数组足够大，足以充分全面的TimSort（猜测这里指归并排序）。小数组则使用内部排序，即，二分插入排序。

8.3 TimSort中的插入排序

    // If array is small, do a "mini-TimSort" with no merges
    if (nRemaining < MIN_MERGE) {
        int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
        binarySort(a, lo, hi, lo + initRunLen, c);
        return;
    }

countRunAndMakeAscending：对二分插入排序的进一步优化，即借助了原数组元素的有序性

private static <T> int countRunAndMakeAscending(T[] a, int lo, int hi,
                                                Comparator<? super T> c) {
    assert lo < hi;
    int runHi = lo + 1;
    if (runHi == hi)
        return 1;

    // Find end of run, and reverse range if descending
    if (c.compare(a[runHi++], a[lo]) < 0) { // Descending
        while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0)
            runHi++;
        reverseRange(a, lo, runHi);
    } else {                              // Ascending
        while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) >= 0)
            runHi++;
    }

    return runHi - lo;
}
private static void reverseRange(Object[] a, int lo, int hi) {
    hi--;
    while (lo < hi) {
        Object t = a[lo];
        a[lo++] = a[hi];
        a[hi--] = t;
    }
}

查找若原数组元素从起始位置开始是否存在连续的有序子数组，若为逆序，变为顺序，若本身为顺序，则什么都不做，并返回该连续有序子数组长度。
下面为二分插入源码：

/**
 * Sorts the specified portion of the specified array using a binary
 * insertion sort.  This is the best method for sorting small numbers
 * of elements.  It requires O(n log n) compares, but O(n^2) data
 * movement (worst case).
 *
 * If the initial part of the specified range is already sorted,
 * this method can take advantage of it: the method assumes that the
 * elements from index {@code lo}, inclusive, to {@code start},
 * exclusive are already sorted.
 *
 * @param a the array in which a range is to be sorted
 * @param lo the index of the first element in the range to be sorted
 * @param hi the index after the last element in the range to be sorted
 * @param start the index of the first element in the range that is
 *        not already known to be sorted ({@code lo <= start <= hi})
 * @param c comparator to used for the sort
 */
/* 比较次数： O(nlogn); 最差移动次数O(n^2)
 * 这里可以作为二分查找的模板代码，也就是所谓的零一模型
 * 该插入位置的确定，也证明了算法的稳定性
 * 例如：对下面数组arr排序，
 *   ____________________________
 *  | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
 *  | 0 | 1 | 2 | 2 | 3 | 4 | 2 |
 *  -----------------------------
 * 该数组从0~5为顺序，所以start为6， 哨兵为arr[6] = 2, 
 * 经过下面代码二分查找(比较3次)，left == right == 4, arr[4] = 3，
 * 该位置即为需移动元素的起始位置，保证了2的稳定性。
*/
private static <T> void binarySort(T[] a, int lo, int hi, int start,
                                   Comparator<? super T> c) {
    assert lo <= start && start <= hi;
    if (start == lo)
        start++;
    for ( ; start < hi; start++) {
        T pivot = a[start];//哨兵为起始位置元素

        // Set left (and right) to the index where a[start] (pivot) belongs
        int left = lo;
        int right = start;
        assert left <= right;
        
        /* 
         * Invariants:
         *   pivot >= all in [lo, left).
         *   pivot <  all in [right, start).
         */
        while (left < right) {
            int mid = (left + right) >>> 1;
            if (c.compare(pivot, a[mid]) < 0)
                right = mid;
            else
                left = mid + 1;
        }
        assert left == right;

        /*
         * The invariants still hold: pivot >= all in [lo, left) and
         * pivot < all in [left, start), so pivot belongs at left.  Note
         * that if there are elements equal to pivot, left points to the
         * first slot after them -- that's why this sort is stable.
         * Slide elements over to make room for pivot.
         */
        int n = start - left;  // The number of elements to move
        // Switch is just an optimization for arraycopy in default case
        switch (n) {
            case 2:  a[left + 2] = a[left + 1];
            case 1:  a[left + 1] = a[left];
                     break;
            default: System.arraycopy(a, left, a, left + 1, n);
        }
        a[left] = pivot;
    }
}

8.4 TimSort中的归并排序

当待排序元素个数大于等于32时，将整个数组拆分为一个个的run，对每个run排序，再对不同的run合并。
整体代码（下面会对代码进行分析）：

    /**
     * March over the array once, left to right, finding natural runs,
     * extending short natural runs to minRun elements, and merging runs
     * to maintain stack invariant.
     */
    TimSort<T> ts = new TimSort<>(a, c, work, workBase, workLen);
    int minRun = minRunLength(nRemaining);
    do {
        // Identify next run
        int runLen = countRunAndMakeAscending(a, lo, hi, c);

        // If run is short, extend to min(minRun, nRemaining)
        if (runLen < minRun) {
            int force = nRemaining <= minRun ? nRemaining : minRun;
            binarySort(a, lo, lo + force, lo + runLen, c);
            runLen = force;
        }

        // Push run onto pending-run stack, and maybe merge
        ts.pushRun(lo, runLen);
        ts.mergeCollapse();

        // Advance to find next run
        lo += runLen;
        nRemaining -= runLen;
    } while (nRemaining != 0);

    // Merge all remaining runs to complete sort
    assert lo == hi;
    ts.mergeForceCollapse();
    assert ts.stackSize == 1;

8.3.1 拆分run

为了达到性能最佳，需将合并后的 run 长度从右至左以指数量级递增，这样从右至左依次进行合并就可以使每次合并的两个 run 的长度大致相同，实现了平衡。
minRun: 划分的run对应的数组的最小阈值，其计算逻辑为：
当数组长度小于32，直接返回数组长度，进行插入排序；
如果数组长度是2的精准乘方（2^m，2的阶乘），返回MIN_MERGE/2，即16；
其余返回整数k，MIN_MERGE/2 <= K <= MIN_MERGE，n/k接近但严格小于2的精准乘方。

private static int minRunLength(int n) {
    assert n >= 0;
    int r = 0;      // Becomes 1 if any 1 bits are shifted off
    while (n >= MIN_MERGE) {
        r |= (n & 1);
        n >>= 1;
    }
    return n + r;
}

8.3.2 对每个run排序

对每个的run进行check，若非连续有序（正序或逆序），则需用二分插入对不满足连续有序的剩余部分排序；

        // Identify next run
        int runLen = countRunAndMakeAscending(a, lo, hi, c);

        // If run is short, extend to min(minRun, nRemaining)
        if (runLen < minRun) {
            int force = nRemaining <= minRun ? nRemaining : minRun;
            binarySort(a, lo, lo + force, lo + runLen, c);
            runLen = force;
        }

8.3.3 合并run

对每个已排序的run入栈；

/**
* Pushes the specified run onto the pending-run stack.
*  * @param runBase index of the first element in the run
* @param runLen  the number of elements in the run
*/
private void pushRun(int runBase, int runLen) {
   this.runBase[stackSize] = runBase;
   this.runLen[stackSize] = runLen;
   stackSize++;
}

判断栈中的run是否需要进行合并。最右边的三个 run 的长度尽量满足两个条件。记最右边的三个 run 的长度从左到右分别是A,B,C，则 Timsort 要求：
A>B+C
B>C
若A≤B+C，则合并A与B&C中的较小者，
若B≤C，则合并B,C

private void mergeCollapse() {
    while (stackSize > 1) {
        int n = stackSize - 2;
        if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) {
            if (runLen[n - 1] < runLen[n + 1])
                n--;
            mergeAt(n);
        } else if (runLen[n] <= runLen[n + 1]) {
            mergeAt(n);
        } else {
            break; // Invariant is established
        }
    }
}

计算相邻run的合并区间：
找到run1中run2一个元素的位置，run1中该位置前面的元素不需合并
找到run2中run1最后一个元素的位置，run2中该位置后面的元素不需合并

/*
    * Find where the first element of run2 goes in run1. Prior elements
     * in run1 can be ignored (because they're already in place).
     */
    int k = gallopRight(a[base2], a, base1, len1, 0, c);
    assert k >= 0;
    base1 += k;
    len1 -= k;
    if (len1 == 0)
        return;

    /*
     * Find where the last element of run1 goes in run2. Subsequent elements
     * in run2 can be ignored (because they're already in place).
     */
    len2 = gallopLeft(a[base1 + len1 - 1], a, base2, len2, len2 - 1, c);
    assert len2 >= 0;
    if (len2 == 0)
        return;

合并2个相邻的 run
需要临时存储空间，临时存储空间的大小是2个 run 中较小的 run 的大小。Timsort算法先将较小的 run
复制到这个临时存储空间，然后用原先存储这2个 run 的空间来存储合并后的 run。合并算法是用简单插入排序，依次从左到右或从右到左比较，然后合并2个 run。

9. 总结

name	average	best	worst	memory	in/out-place	stable
insertion sort	O(n^2)	O(n)	O(n^2)	O(1)	in-place	Y
shell sort	O(n^（1.3-2）)	O(nlogn)	O(nlogn)	O(1)	in-place	N
quick sort	O(nlogn)	O(nlogn)	O(n^2)	O(logn)	in-place	N
heap sort	O(nlogn)	O(nlogn)	O(nlogn)	O(1)	in-place	N
merge sort	O(nlogn)	O(nlogn)	O(nlogn)	O(n)	out-place	Y
count sort	O(n+k)	O(n+k)	O(n+k)	O(k)	out-place	Y
radix sort	O(n*k)	O(n*k)	O(n*k)	O(n+k)	out-place	Y
TimSort	O(nlogn)	O(n)	O(nlogn)	O(n)	out-place	Y

参考资料：
Timsort原理学习

Ellenjing

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
算法 - 排序

排序算法最近在面试过程中发现很多人对排序算法了解不多，排序成了面试的重灾区之一。所以特此整理下自己的学习笔记，分享给大家。1. 快速排序分而治之：在快排中最重要的过程叫partition, 然后分别对左右两个区间进行递归进行快排1.1 稳定性不稳定1.2 时间复杂度快速排序最坏运行情况是顺序数列，时间复杂度为 O(n²)。但快排的平均时间复杂度是 O(nlogn)，且 O(nlogn) 中隐含的常数因子很小，比复杂度稳定等于 O(nlogn) 的归并排序要小很多。对绝大多数顺序性较弱的随
复制链接

扫一扫

专栏目录