# Hashing

Hash函数的性质

Static Hashing:

Deficiency：

One Solution: 周期的用新的Hash函数重新组织文件。

Better solution: 动态的改变桶的数目。

Extendible Hashing

Idea:使用指向桶的指针目录，通过双倍指针目录来双倍桶。因为指针目录比文件小，所以双倍指针目录会更划算。

Global depth of directory p: Max # of bits needed to tell which bucket an entry belongs to.

Local depth of a bucket q: # of bits used to determine if an entry belongs to this bucket.

Each bucket has pointers to it from the directory.

When does bucket split cause directory doubling?

If the bucket has only 1 pointer to it from the directory, doubling the directory; Otherwise, simply redistribute the pointers after splitting.

When to merge bucket and shrink directory during deletion?

Merge Bucket: merge with its split image when bucket becomes empty.

Shrink directory: if every directory element and its split image directory entry point to the same bucket, shrink directory by ½.

Deciency:

Directory can grow large if the distribution of hash values is skewed.

Multiple entries with same hash value cause problems!

Linear Hashing

1：Handle long overflow chains.

2：Handle duplicates.

3： Idea: use a family of hash functions h0, h1, h2, ... hi+1 doubles the range of hi (similar to directory doubling) .

4： Splitting proceeds in “round”. (Round ends when all initial buckets are split. ).

5: Current round number is Level, and current function is hLevel。

address(level,key) = hash(key) mod  N * (2level)

(All Above come from PPT of Ji Liping, HIT Shenzhen Graduate School)