Introduction to Algorithms (Hashing with Chaining)

Dictionary:

Maintain a set of items each with a key

  • insert(item)
  • delete(item)
  • search(key): return the item with given key or report doesn't exist

Motivation

Dictionaries are perhaps the most popular data structure in CS  

Less obvious, using hashing techniques: 

  • built into most modern programming languages (Python, Perl, Ruby, JavaScript, Java, C++, C#, . . . )
  • e.g. best docdist code: word counts & inner product
  • implement databases: (DB HASH in Berkeley DB)
    • English word → definition (literal dict.)
    • English words: for a spelling correction
    • word → all web pages containing that word
    • username → account object
  • compilers & interpreters: names → variables
  • network routers: IP address → wire
  • network server: port number → socket/app.
  • virtual memory: virtual address → physical
  • substring search (grep, Google)
  • string commonalities (DNA) 
  • file or directory synchronization
  • cryptography: file transfer & identification

How do we solve the dictionary problem?

Simple approach: Direct-access-table

  • store items in the array indexed by key
  1. keys must be non-negative integers (or using two arrays, integers)
  2. large key range =⇒ large space — e.g. one key of 2256 is bad news.

Solution to 1:  “prehash” keys to integers

Solution to 2: hashing

  1. reduce universe U of all keys(integers) down to reasonable size m for table
  2. idea: m = \Theta (n), n = #keys in dict
  3. hash function h: U → {0, 1, . . . , m − 1}
  4. two keys ki , kj ∈ K collide if h(ki) = h(kj)

Chaining:

the linked list of colliding items in each slot of the table

Simple uniform hashing:

  1. each key is equally likely to be hashed to any slot of the table

  2. independent of where other keys hashing  

Analysis

  • expected length of chain for n keys, m slots = n/m = α = load factor
  • running time = O(1+α)

Hash functions

  1. division method: h(k) = k mod m
  2. multiplication method: h(k) = [(ak)mod2^{w}]>>(w-r),m=2^{r}
  3. universal hashing:h(k) = [(ak+b) mod p]mod m, where a and b are random ∈ {0, 1, . . . p−1} and p is a large prime (> |U|).

for worst-case keys k1 != k2:
    Pr{h(k1) = h(k2)} = 1/m

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值