面试一个经典的老问题就是排序算法思想,本篇先回顾基础内部排序算法,之后简单分析外部排序算法,并以腾讯的一道面试题作结.
随机数生成器
- 要排序,首先得有数据,因此,手写一个随机数生成器
private void generate_random_numbers(int n) {
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new FileWriter("data.txt"));
for (int i = 0; i < n; ++i) {
writer.write((int)(Math.random()*10000) + " ");
}
writer.close();
} catch (IOException e) {
System.out.println("Generate random numbers -- Error!");
}
}
数据读取器
- 现有数据在文件中,手写一个数据读取器,将数据加载到内存,为内部排序做准备.
private Integer[] read_random_numbers() {
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("data.txt"));
String data = reader.readLine();
String[] str_arr = data.split(" ");
int len = str_arr.length;
Integer[] arr = new Integer[len];
for (int i = 0 ; i < len; ++i) {
arr[i] = Integer.parseInt(str_arr[i]);
}
reader.close();
return arr;
} catch (IOException e) {
System.out.println("Read random numbers -- Error!");
}
return new Integer[]{};
}
输出模块
- 需要有输出模块展示排序的结果
private void display(Integer[] arr) {
int len = arr.length;
for (int i = 0; i < len; ++i) {
System.out.print(arr[i] + " ");
}
}
交换模块
- 排序过程中必定涉及到元素的交换,用 Java反射 编写一个交换模块
private void swap(Integer num1, Integer num2) {
Class<Integer> integerClass = (Class<Integer>) num1.getClass();
try {
Field value = integerClass.getDeclaredField("value");
value.setAccessible(true);
int tmp = num1;
value.setInt(num1, num2);
value.setInt(num2, tmp);
} catch (Exception e) {
e.printStackTrace();
}
}
内部排序算法思想及 Java 实现
冒泡排序
- 冒泡排序通过不断比较前后两个元素的大小,通过将大元素不断往后推移 (就如同冒泡一样),最后得到一个从小到大的序列.
- Java 代码:
public void bubbleSort(Integer[] arr) {
int len = arr.length;
for (int i = 0; i < len; ++i) {
boolean is_swap = false;
for (int j = 0; j < len-1-i; ++j) {
if (arr[j] > arr[j+1]) {
swap(arr[j], arr[j+1]);
is_swap = true;
}
}
if (!is_swap) {
return ;
}
}
}
选择排序
- 选择排序通过循环不断将第 k 小的排放到第 k 位上,最终形成有序序列.
- Java 代码:
public void selectSort(Integer[] arr) {
int len = arr.length;
for (int i = 0; i < len; ++i) {
int min_id = i;
for (int j = i+1; j < len; ++j) {
if (arr[j] < arr[min_id]) {
min_id = j;
}
}
if (min_id != i) {
swap(arr[i], arr[min_id]);
}
}
}
插入排序
- 插入排序就像小学生帮老师按座号排试卷一样,通过不断寻找数k合适的位置 (指大于前一个数而小于后一个数),不断插入形成有序序列.
- Java 代码:
public void insertSort(Integer[] arr) {
int len = arr.length;
for (int i = 1; i < len; ++i) {
int cur_num = arr[i];
int j = i-1;
for ( ; j >= 0 && cur_num < arr[j]; --j) {
arr[j+1] = arr[j];
}
arr[j+1] = cur_num;
}
}
归并排序
- 归并排序核心思想是分治法,通过将大序列划分为小序列,对小序列进行排序后合并得到有序序列.
- Java 代码:
// 归并排序 api (保持对外接口的一致性)
public void mergeSort(Integer[] arr) {
int len = arr.length;
coreOfMergeSort(arr, 0, len-1);
}
// 归并排序的内核
private void coreOfMergeSort(Integer[] arr, int left, int right) {
if (left >= right) {
return ;
}
int mid = (left+right) >> 1;
coreOfMergeSort(arr, left, mid);
coreOfMergeSort(arr, mid+1, right);
merge(arr, left, mid, right);
}
// 合并操作
private void merge(Integer[] arr, int left, int mid, int right) {
Integer[] tmp = new Integer[right-left+1];
int i = left, j = mid+1, u = 0;
while (i <= mid && j <= right) {
if (arr[i] <= arr[j]) {
tmp[u] = arr[i];
++i;
} else {
tmp[u] = arr[j];
++j;
}
++u;
}
while (i <= mid) {
tmp[u] = arr[i];
++i; ++u;
}
while (j <= right) {
tmp[u] = arr[j];
++j; ++u;
}
for (i = left; i <= right; ++i) {
arr[i] = tmp[i-left];
}
}
快速排序
- 快速排序也借用了分治思想,与归并排序不同的是,归并排序是自底而上合并有序序列,快速排序是自上而下形成有序序列.
- Java 代码:
// 快速排序 api (对外提供一致接口)
public void quickSort(Integer[] arr) {
int len = arr.length;
coreOfQuickSort(arr, 0, len-1);
}
// 快速排序的内核
private void coreOfQuickSort(Integer[] arr, int left, int right) {
if (left >= right) {
return ;
}
int first_num = arr[left];
int i = left, j = right;
while (i < j) {
while (i < j && arr[j] >= first_num) {
--j;
}
while (i < j && arr[i] <= first_num) {
++i;
}
swap(arr[i], arr[j]);
}
swap(arr[left], arr[j]);
coreOfQuickSort(arr, left, j-1);
coreOfQuickSort(arr, j+1, right);
}
堆排序
- 堆排序使用到的核心数据结构是堆,堆的一个重要特性是根节点是最值. 堆排序通过将根节点依次排进序列形成有序.
- Java 代码:
// 堆排序
public void heapSort(Integer[] arr) {
int len = arr.length;
Integer[] heap = new Integer[len+1];
// 从1开始比较好计算下标
for (int i = 1; i <= len; ++i) {
heap[i] = arr[i-1];
}
for (int i = len/2; i >= 1; --i) {
heapAdjust(heap, i, len);
}
for (int i = len; i >= 1; --i) {
swap(heap[1], heap[i]);
heapAdjust(heap, 1, i-1);
}
for (int i = 0 ; i < len; ++i) {
arr[i] = heap[i+1];
}
}
// 堆调整
private void heapAdjust(Integer[] arr, int pos, int len) {
while (pos*2 <= len) {
int to_swap = pos;
if (arr[pos] <= arr[pos*2]) {
to_swap = pos*2;
}
if (pos*2+1 <= len && arr[to_swap] <= arr[pos*2+1]) {
to_swap = pos*2+1;
}
if (to_swap == pos) {
return ;
}
swap(arr[pos], arr[to_swap]);
pos = to_swap;
}
}
希尔排序
- 希尔排序是插入排序的改进版,希尔排序通过带有一定步长的插入排序先局部有序化序列,之后完成一整趟插入排序. 由于序列在之前已经部分有序,因此,最后一趟插入排序的时间效率实际很高.
- Java 代码:
public void shellSort(Integer[] arr) {
int len = arr.length;
int step = len;
while (step != 1) {
step /= 2;
for (int i = step; i < len; ++i) {
int tmp = arr[i];
int j = i;
for (; j >= step && arr[j-step] > tmp; j -= step) {
arr[j] = arr[j-step];
}
arr[j] = tmp;
}
}
}
基数排序
- 基数排序利用数位的特点,从低位到高位逐步重排数组,最终形成有序序列.
- Java 代码:
// 基数排序求位数
private int maxBits(Integer[] arr) {
int len = arr.length;
int maxBits = 0;
Integer[] tmp = new Integer[len];
for (int i = 0; i < len; ++i) {
tmp[i] = arr[i];
}
for (int i = 0; i < len; ++i) {
int cnt = 0;
while (tmp[i] != 0) {
tmp[i] /= 10;
++cnt;
}
maxBits = Math.max(maxBits, cnt);
}
return maxBits;
}
// 基数排序
public void baseSort(Integer[] arr) {
int mb = maxBits(arr);
Deque<Integer>[] dq = new LinkedList[10];
for (int i = 0; i < 10; ++i) {
dq[i] = new LinkedList<>();
}
int len = arr.length;
Integer[] tmp = new Integer[len];
for (int i = 0; i < len; ++i) {
tmp[i] = arr[i];
}
int base = 10;
while (mb != 0) {
--mb;
for (int i = 0; i < len; ++i) {
dq[ tmp[i]%10 ].addLast(arr[i]);
}
int u = 0;
for (int i = 0 ; i < 10; ++i) {
while (!dq[i].isEmpty()) {
arr[u++] = dq[i].getFirst();
dq[i].pollFirst();
}
}
for (int i = 0; i < len; ++i) {
tmp[i] = arr[i] / base;
}
base *= 10;
}
}
内部排序复杂度及稳定性总结
外部排序思想
- 外部排序指的是那些由于 size 太大,没办法一次性 load 进内存进行排序而需要借助二级存储完成的排序. 借鉴内部排序分治思想,通用的方法是将外部数据进行切割,分块装入内存排序后移回外存. 最后整合所有已排序分块,形成整个有序序列,写回磁盘.
完整代码 – Solution类
import java.io.*;
import java.lang.reflect.Field;
import java.util.Deque;
import java.util.LinkedList;
class Solution {
// 手写交换函数
private void swap(Integer num1, Integer num2) {
Class<Integer> integerClass = (Class<Integer>) num1.getClass();
try {
Field value = integerClass.getDeclaredField("value");
value.setAccessible(true);
int tmp = num1;
value.setInt(num1, num2);
value.setInt(num2, tmp);
} catch (Exception e) {
e.printStackTrace();
}
}
// 产生随机数,仅在第 1 次使用
private void generate_random_numbers(int n) {
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new FileWriter("data.txt"));
for (int i = 0; i < n; ++i) {
writer.write((int)(Math.random()*10000) + " ");
}
writer.close();
} catch (IOException e) {
System.out.println("Generate random numbers -- Error!");
}
}
// 数据读取到内存
private Integer[] read_random_numbers() {
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("data.txt"));
String data = reader.readLine();
String[] str_arr = data.split(" ");
int len = str_arr.length;
Integer[] arr = new Integer[len];
for (int i = 0 ; i < len; ++i) {
arr[i] = Integer.parseInt(str_arr[i]);
}
reader.close();
return arr;
} catch (IOException e) {
System.out.println("Read random numbers -- Error!");
}
return new Integer[]{};
}
// 输出模块
private void display(Integer[] arr) {
int len = arr.length;
for (int i = 0; i < len; ++i) {
System.out.print(arr[i] + " ");
}
}
// 冒泡排序
public void bubbleSort(Integer[] arr) {
int len = arr.length;
for (int i = 0; i < len; ++i) {
boolean is_swap = false;
for (int j = 0; j < len-1-i; ++j) {
if (arr[j] > arr[j+1]) {
swap(arr[j], arr[j+1]);
is_swap = true;
}
}
if (!is_swap) {
return ;
}
}
}
// 选择排序
public void selectSort(Integer[] arr) {
int len = arr.length;
for (int i = 0; i < len; ++i) {
int min_id = i;
for (int j = i+1; j < len; ++j) {
if (arr[j] < arr[min_id]) {
min_id = j;
}
}
if (min_id != i) {
swap(arr[i], arr[min_id]);
}
}
}
// 插入排序
public void insertSort(Integer[] arr) {
int len = arr.length;
for (int i = 1; i < len; ++i) {
int cur_num = arr[i];
int j = i-1;
for ( ; j >= 0 && cur_num < arr[j]; --j) {
arr[j+1] = arr[j];
}
arr[j+1] = cur_num;
}
}
// 归并排序 api (保持对外接口的一致性)
public void mergeSort(Integer[] arr) {
int len = arr.length;
coreOfMergeSort(arr, 0, len-1);
}
// 归并排序的内核
private void coreOfMergeSort(Integer[] arr, int left, int right) {
if (left >= right) {
return ;
}
int mid = (left+right) >> 1;
coreOfMergeSort(arr, left, mid);
coreOfMergeSort(arr, mid+1, right);
merge(arr, left, mid, right);
}
// 合并操作
private void merge(Integer[] arr, int left, int mid, int right) {
Integer[] tmp = new Integer[right-left+1];
int i = left, j = mid+1, u = 0;
while (i <= mid && j <= right) {
if (arr[i] <= arr[j]) {
tmp[u] = arr[i];
++i;
} else {
tmp[u] = arr[j];
++j;
}
++u;
}
while (i <= mid) {
tmp[u] = arr[i];
++i; ++u;
}
while (j <= right) {
tmp[u] = arr[j];
++j; ++u;
}
for (i = left; i <= right; ++i) {
arr[i] = tmp[i-left];
}
}
// 快速排序 api (对外提供一致接口)
public void quickSort(Integer[] arr) {
int len = arr.length;
coreOfQuickSort(arr, 0, len-1);
}
// 快速排序的内核
private void coreOfQuickSort(Integer[] arr, int left, int right) {
if (left >= right) {
return ;
}
int first_num = arr[left];
int i = left, j = right;
while (i < j) {
while (i < j && arr[j] >= first_num) {
--j;
}
while (i < j && arr[i] <= first_num) {
++i;
}
swap(arr[i], arr[j]);
}
swap(arr[left], arr[j]);
coreOfQuickSort(arr, left, j-1);
coreOfQuickSort(arr, j+1, right);
}
// 堆排序
public void heapSort(Integer[] arr) {
int len = arr.length;
Integer[] heap = new Integer[len+1];
// 从1开始比较好计算下标
for (int i = 1; i <= len; ++i) {
heap[i] = arr[i-1];
}
for (int i = len/2; i >= 1; --i) {
heapAdjust(heap, i, len);
}
for (int i = len; i >= 1; --i) {
swap(heap[1], heap[i]);
heapAdjust(heap, 1, i-1);
}
for (int i = 0 ; i < len; ++i) {
arr[i] = heap[i+1];
}
}
// 堆调整
private void heapAdjust(Integer[] arr, int pos, int len) {
while (pos*2 <= len) {
int to_swap = pos;
if (arr[pos] <= arr[pos*2]) {
to_swap = pos*2;
}
if (pos*2+1 <= len && arr[to_swap] <= arr[pos*2+1]) {
to_swap = pos*2+1;
}
if (to_swap == pos) {
return ;
}
swap(arr[pos], arr[to_swap]);
// int tmp = arr[pos];
// arr[pos] = arr[to_swap];
// arr[to_swap] = tmp;
pos = to_swap;
}
}
// 基数排序求位数
private int maxBits(Integer[] arr) {
int len = arr.length;
int maxBits = 0;
Integer[] tmp = new Integer[len];
for (int i = 0; i < len; ++i) {
tmp[i] = arr[i];
}
for (int i = 0; i < len; ++i) {
int cnt = 0;
while (tmp[i] != 0) {
tmp[i] /= 10;
++cnt;
}
maxBits = Math.max(maxBits, cnt);
}
return maxBits;
}
// 基数排序
public void baseSort(Integer[] arr) {
int mb = maxBits(arr);
Deque<Integer>[] dq = new LinkedList[10];
for (int i = 0; i < 10; ++i) {
dq[i] = new LinkedList<>();
}
int len = arr.length;
Integer[] tmp = new Integer[len];
for (int i = 0; i < len; ++i) {
tmp[i] = arr[i];
}
int base = 10;
while (mb != 0) {
--mb;
for (int i = 0; i < len; ++i) {
dq[ tmp[i]%10 ].addLast(arr[i]);
}
int u = 0;
for (int i = 0 ; i < 10; ++i) {
while (!dq[i].isEmpty()) {
arr[u++] = dq[i].getFirst();
dq[i].pollFirst();
}
}
for (int i = 0; i < len; ++i) {
tmp[i] = arr[i] / base;
}
base *= 10;
}
}
// 希尔排序
public void shellSort(Integer[] arr) {
int len = arr.length;
int step = len;
while (step != 1) {
step /= 2;
for (int i = step; i < len; ++i) {
int tmp = arr[i];
int j = i;
for (; j >= step && arr[j-step] > tmp; j -= step) {
arr[j] = arr[j-step];
}
arr[j] = tmp;
}
}
}
public static void main(String[] args) {
Solution s = new Solution();
// s.generate_random_numbers(1000);
Integer[] arr = s.read_random_numbers();
// 这里调用排序算法
s.display(arr);
}
}
面试题
10G 个整数,乱序排列,要求找出中位数。内存限制为 2G。
以下供参考:
- 扫描10G个整数,对每个整数,取高28位,映射到数组的某个元素上
- 给数组的这个元素加1,表示找到一个属于该数据段的元素
- 扫描完10G个整数后,数组cnt中就记录了每段中元素的个数
- 从第一段开始,将元素个数累计,直到值刚好小于5G,则中位数就在该段
- 这时对10G个整数再扫描一遍,记录该段中每个元素的个数。直至累计到5G即可
参考文献
- 排序算法总结 (https://www.runoob.com/w3cnote/sort-algorithm-summary.html)
- 十道海量数据处理面试题与十个方法大总结 (https://zhuanlan.zhihu.com/p/341386422)
- Bitmap简介 (https://www.cnblogs.com/cjsblog/p/11613708.html)
- 腾讯海量数据面试题 (https://cloud.tencent.com/developer/article/1558259)