Sorting Searching 数据结构

最新推荐文章于 2024-08-19 23:56:23 发布

栗子瓜

最新推荐文章于 2024-08-19 23:56:23 发布

阅读量54

点赞数

文章标签：数据结构算法排序算法 c++ hash

本文链接：https://blog.csdn.net/weixin_50404914/article/details/131196412

版权

Sorting

Insertion Sort

insert current element into correct place in previous elements sorted
唯一一个在数组有序时时间复杂度会大大降低的排序方法，可以达到 $O (n)$ ，其他情况 $O(n^2)$ .

Bubble Sort

traverse backward & swap adjacent elements that are in wrong prior-order
时间复杂度永远 $O(n^2)$
involve purely local operations & can be fully parallelized
spirit of parallel processing

Selection Sort

k-th traversal selects the k-th prior element
时间复杂度永远 $O(n^2)$

以上三种属于Adjacent exchange sort.

Shell Sort

hierarchical insertion sort with varying increments
从相隔N个开始insertion sort直到挨着
division by two increments(1,2,4,8…) or division by three increments(1,4,13,40…)
complexity depending on shell sequence $O(n(\log n)^2)$

Merge Sort

使用了divide & conquer思想
divide into sub-tasks that are much easier aand can be merged efficienly
merge sorted sub-sequences into sorted one,比较两个sequence开头的值，把prior的放到最后的array中，这个复杂度是 $O (n)$
时间复杂度 W+A+B $O(n\log n)$

BST sort

insert sequence into BST one by one
sort via inorder traversal of the completed BST
缺点：1 需要为BSTtree准备额外空间 2 需要时间插入新node，creation needs time
提供了使用pivot的思想
时间复杂度 worst $O(n^2)$ , 最好和平均 $O(n\log n)$

Quicksort

使用了divide & conquer思想
prior to pivot放在pivot左边，其他放在pivot右边，partially-sorted form
在BST的基础上进行了进阶
时间复杂度 worst $O(n^2)$ , 最好和平均 $O(n\log n)$
在寻找pivot时，不是直接选最中间的，而是把头尾和中间的元素大小进行比较，选出中间的那个（median of first,middle ,last）
不必获得一个完全排好序的列表，可以设置一个最小的长度，然后最后使用insertion sort

Heap Sort

batch initialization, backward iterated siftdown
iterative removeroot
W+A+B $O(n\log n)$

Heap<T,P> h(nmax,s,n)
//T是元素类别，P是比大小方式，nmax最大容量，s数组，n数组长度

Radix Sort

sort according to the first lower end digit
根据个位十位百位数字排序
r radix i.e. base of digits, d max number of digits，complexity $O ((n + r) d))$ , if n values are dense distributed, radixsort tends to be very efficient

File processing & external sorting

a sector is the smallest file allocation unit
platter contains tracks, track contains sectors, cluster contains sectors 但是比track小，track contains clusters
platter > track > cluster > sector
计算时间：最小：第一个random seek一个track，一半的rotation加转一圈，后面的是track-to-track，一半的rotation加转一圈
最大：256*[8+8.33*(0.5+8/256)]
access一个sector比一个track要时间少很多，但是跟一个byte差别不大，所以一般设计算法时按照每个sector处理
sort collections of records too large to fit in main memory, index file - key sort & pointer methodology
只需要排列index，index指向pointer，不需要更改database file里面额顺序

External Mergesort

get the min elements of both sub-sequences immediately &pop the smaller one
take advantage of sequential processing &double buffering

External Heapsort

先把input里面的数据填到heap里面，siftdown，然后把root取到output
把input里面的下一个和root比较，如果root更加prior，就填到root里面然后siftdown
如果input的下一个更加prior，就先siftdown然后放到heap的最后，并断开连接
当heap的所有链接都断开，就结束这一轮heap，开始下一轮

一个具有C++特性的swap函数：

template <typename T> inline void swap(T& a, T& b){
T tmp = std::move(a);
a = std::move(b);
b = std::move(tmp);
}

Searching

分为精确搜索 exact-match query 和区间搜索 range query

Binary Search

先跟头尾对比，然后跟中间对比

Interpolation Search

iM = iL + (iR-iL)*(st-s[iL])/(s[iR]-s[iL])

List methods

80% of the records accesses are to 20% of the records

随机的array只能遍历搜索
sorted array可以用binary search，interpolation search两种方式
interpolation search只能用于数字类型的array，不能用于字符串
可以用access frequency来sorted array，因为80%的访问集中于20%的记录中
self-organizing lists
- move-to-front 每读到一个record就把它放到list的最前面
- transpose 高级swap的形式

Hashing

Direct access by key value, process of finding a record by somehow mapping its key value to an array position.

hash function h: function may map key values to array positions eg. 取余数，平方中位数

hash table(HT): 盛放record的array

slot number of slot N: position in the hash table

objective: 用哈希函数对于每一个k找到一个对应的位于[0,N]之间的slot，并且把record存到哈希表里面

怎么找到record：根据哈希函数找到i，从h(K)开始找（因为可能存在冲突的情况）

因此选择哈希函数时应该尽量增加随机性，减少重复reduce collision (比如根据K的分布设计哈希函数)

Hashing的分类

open hashing： each table slot initiates a linked list每个位置都是链表头，缺点，涉及很多动态内存
close hashing: 往下找空位置，可能会有很多的堆叠，使用pseudo-random probing(往下是按照某一顺序的，但是每次不一样)
- double hashing: hash两次qwq
- record removal- tombstone avoid probing sequence breaking, allow reuse of hash table slots