Data Structure Lecture Note (Week 7, Lecture 19)

Advanced ADT:

BBST: AVL, red-black, B tree, B+ tree

Hashing: unordered dictionary

"In an interview, always ask CAN I USE HASH? "

In C++, hashing table is implemented as std::unordered_map

In Python, … is dict()

How to implement

Keys: an abstract object, we can use binary data representing the object as a key and convert it to either a string or a number (such as HEX string or base64 encoding)

So we can assume keys are strings

Try to map the keys into some integer number in a certain integer range, say [0, 65535]

This mapping f should be fast to compute, i.e. linear in the length of the key or quadratic

Hope, the mapped number is a unique number, then by RAM we can find/delete/insert the item in O(1) time

​ If we want to store key string S with value V, we just put V in the array position f(S)

Hash function

If F is a function that maps from strings to integers with fixed range, then F is a string hash function

A good hash function should have as less COLLISIONS as possible

Consider mapping a string to an integer $(\sum_j P^j s[j]) mod , Q $.

best hash fcn is a 1-1 mapping.

Separate Chaining

If the table is occupied, put a linked list. The maximum length of the list is called load factor. We hope this fator is a constant

Desired property for hash function:

  • The hashed keys are nicely spread out so that we do not have too many collisions, since collisions affect the time to perform lookups and deletes
  • Table size M = O(N)
  • The hash function h is fast to compute

Actually, we want f to be random enough, for each input, if the deterministic function f can encode the input to a nearly random (but deterministic) number, it is good. Functions having such property is called pseudo-randomness.

for example: MD5

“Almost random function” properties

The function is really just like throwing a dart on the target range, i.e. uniform distributed

If hash table size is N = the key domain size

  • The load factor is O(log N) in worst case, bu on average it is O(1)

Birthday paradox

  • When there are n or more people in a room, what is the chance that two people have the same birthday?
  • It turns out that for a table of size 365 you need only 23 keys for a 50% chance of a collision, and as little as 60 keys for a 99% chance.

Open addressing: maintain an array that is some constant factor factor larger than the number of keys and to store all keys directly in this array. Every cell in the array is either empty or contains a key

Load factor λ = n / m \lambda = n/m λ=n/m, where m is the size of the table and n is the size of the key space.

Probe sequence: map a key into a sequence instead of a number.

​ Linear probing: hash(key) = [ hash(key) mode m, hash(key) mode m + 1, hash(key) mode m + 2, … ]

​ best case: expected move: 0.5. worst case: n/4 = n/(2n) * 0 + 1/(2n) * n + 1/(2n) * (n-1) +… = (n+1)/4

​ Quadratic probing: hash(key) = [ hash(key) mode m, hash(key) mode m + 1, hash(key) mode m + 4, hash(key) mode m + 9, … ]

​ So quadratic probing could possibly jump over large cluster

​ But one question is: whether they can traverse the entire table

Claim: if m is prime and the table is at least at least half empty, then quadratic probing will always find an empty location. Furthermore, no locations are checked twice.

Implementations of dictionary with comparable keys: BBST

AVL tree is a binary search tree in which:

​ For every node in the tree, the height of the left and right subtrees differ by at most 1.

Rotations to maintain the property.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值