Search with Hashing

RaineNa

已于 2024-07-02 15:46:24 修改

阅读量569

点赞数 21

文章标签：哈希算法算法

于 2024-07-02 15:45:52 首次发布

本文链接：https://blog.csdn.net/qq_58325158/article/details/140125237

版权

本文介绍哈希表，内容来自于阅读Sebastian Wandelt的绝版校内教材所做笔记。万分感谢！
Much thanks to 小赛！

1 The existing data structures

在计算机科学中，存在两种非常重要用来存储有序集合（ordered collections）的数据结构：链表（Linked list）和数组（array）。

Linked list: A linear data sturcture where the elements are stored using pointers between elements. Each node is the combination of its elements and a pointer.

由于pointer的存在，在链表中执行插入操作只需要O(1)的时间复杂度。但是要在链表中找到指定位置的元素需要O(n).

Array: An array consists of elements sharing the same data type and are stored with the order of their corresponding index. The data type list is highly similar with arrays.

与链表相反，找到指定位置的元素非常简单（O(1)）；但是插入一个新元素需要线性时间，因为数组是作为一个整体存在的，因此如果插入一个新元素必须移动该位置之后的所有元素。

	Sorted linked list	Sorted arrays	Balanced BSTs
Search	O(n)	O(lg n)	O(lg n)
Insert/Delete	O(n)	O(n)	O(lg n)

除了这两种数据结构，本文介绍一种在Searching和Insertion/Deletion操作都拥有更少时间复杂度的数据结构：Hash Tables.

2 The universe and a slected function

Idea: map every element in the data set into a table called hash table.

更详细的基本概念参见这篇知乎文章

简单说来就将所有数据分配到一个一个bucket中。例如用模运算作为hash function时，当运算结果为1时，就到结果为1的bucket中寻找。最好情况下就是运算结果中的bucket只有一个元素，这样就直接找到了。但是，大多数情况下bucket中都会有不止一个元素（collision），如何解决呢？以下介绍几种常用方法。

Separate chaining（分离链法 / 链地址法）：该方法将所有相同hash value的元素构成一个链表。但在最坏情况下，该方法的搜索时间复杂度为O(n)；
Open addressing（开放定址）：该方法将拥有相同哈希值的元素向后顺延一个位置，即如果哈希值为h的位置已经有元素了，那么就放在h+1的位置；（Linear probing）如果h+i的位置被占用，那么就检查h+i+1的位置，直到寻找到空位。或者（Quadratic probing）如果h+i的位置被占用，那么就检查h+(i+1)²的位置
Double hashing（双重散列 / 二次哈希）：使用两个hash function， h1(x) 和 h2(x) 的线性组合 h1(x) + i * h2(x) 来寻址，可以有效避免开放定址中拥有相同哈希值的元素的聚集（clustering）问题。