Data Structure Lecture Note (Week 7, Lecture 19)

最新推荐文章于 2021-08-02 19:54:52 发布

ZJ_Frank

最新推荐文章于 2021-08-02 19:54:52 发布

阅读量83

点赞数

分类专栏：数据结构与算法文章标签： hashcode hashmap 算法

本文链接：https://blog.csdn.net/ZJ_11701/article/details/107462215

版权

数据结构与算法专栏收录该内容

28 篇文章 0 订阅

订阅专栏

Advanced ADT:

BBST: AVL, red-black, B tree, B+ tree

Hashing: unordered dictionary

"In an interview, always ask CAN I USE HASH? "

In C++, hashing table is implemented as std::unordered_map

In Python, … is dict()

How to implement

Keys: an abstract object, we can use binary data representing the object as a key and convert it to either a string or a number (such as HEX string or base64 encoding)

So we can assume keys are strings

Try to map the keys into some integer number in a certain integer range, say [0, 65535]

This mapping f should be fast to compute, i.e. linear in the length of the key or quadratic

Hope, the mapped number is a unique number, then by RAM we can find/delete/insert the item in O(1) time

If we want to store key string S with value V, we just put V in the array position f(S)

Hash function

If F is a function that maps from strings to integers with fixed range, then F is a string hash function

A good hash function should have as less COLLISIONS as possible

Consider mapping a string to an integer $(\sum_j P^j s[j]) mod , Q $.

best hash fcn is a 1-1 mapping.

Separate Chaining

If the table is occupied, put a linked list. The maximum length of the list is called load factor. We hope this fator is a constant

Desired property for hash function:

The hashed keys are nicely spread out so that we do not have too many collisions, since collisions affect the time to perform lookups and deletes
Table size M = O(N)
The hash function h is fast to compute

Actually, we want f to be random enough, for each input, if the deterministic function f can encode the input to a nearly random (but deterministic) number, it is good. Functions having such property is called pseudo-randomness.

for example: MD5

“Almost random function” properties

The function is really just like throwing a dart on the target range, i.e. uniform distributed

If hash table size is N = the key domain size

The load factor is O(log N) in worst case, bu on average it is O(1)

Birthday paradox

When there are n or more people in a room, what is the chance that two people have the same birthday?
It turns out that for a table of size 365 you need only 23 keys for a 50% chance of a collision, and as little as 60 keys for a 99% chance.

Open addressing: maintain an array that is some constant factor factor larger than the number of keys and to store all keys directly in this array. Every cell in the array is either empty or contains a key

Load factor $\lambda = n/m$ , where m is the size of the table and n is the size of the key space.

Probe sequence: map a key into a sequence instead of a number.

Linear probing: hash(key) = [ hash(key) mode m, hash(key) mode m + 1, hash(key) mode m + 2, … ]

best case: expected move: 0.5. worst case: n/4 = n/(2n) * 0 + 1/(2n) * n + 1/(2n) * (n-1) +… = (n+1)/4

Quadratic probing: hash(key) = [ hash(key) mode m, hash(key) mode m + 1, hash(key) mode m + 4, hash(key) mode m + 9, … ]

So quadratic probing could possibly jump over large cluster

But one question is: whether they can traverse the entire table

Claim: if m is prime and the table is at least at least half empty, then quadratic probing will always find an empty location. Furthermore, no locations are checked twice.

Implementations of dictionary with comparable keys: BBST

AVL tree is a binary search tree in which:

For every node in the tree, the height of the left and right subtrees differ by at most 1.

Rotations to maintain the property.

ZJ_Frank

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Data Structure Lecture Note (Week 7, Lecture 19)

Advanced ADT:BBST: AVL, red-black, B tree, B+ treeHashing: unordered dictionary"In an interview, always ask CAN I USE HASH? "In C++, hashing table is implemented as std::unordered_mapIn Python, … is dict()How to implementKeys: an abstract object, we
复制链接

扫一扫