Basics of Hash Table--Data Structure

Intro:

API: Python: Dict; JAVA: HashMap

Applications: File Systems; Password Verification; Store Optimization

IP Address: 

Main Loop

log - array of log lines(time,IP)
C - mapping from IPs to counters
i - first unprocessed log line
j - first line in current 1h window

i0

j0
C←∅
Each second

UpdateAccessList(log, i, j, C)

UpdateAccessList(log, i, j, C)

while log[i].timeNow():

    C[log[i].IP]C[log[i].IP] + 1

    ii+1

while log[j].timeNow()3600:

    C[log[j].IP]C[log[j].IP]1

    jj+1

AccessedLastHour(IP, C)

return C[IP]> 0


Direct Addressing

Need a data structure forC
There are 232different IP(v4) addresses

Convert IP to 32-bit integer
Create an integer array
A of size 232

Use A[int(IP)]as C[IP


int(IP)

return IP[1]·224+IP[2]·216+IP[3]·28+IP[4]

UpdateAccessList(log, i, j, A)

while log[i].timeNow():

    A[int(log[i].IP)]A[int(log[i].IP)] + 1

    ii+1

while log[j].timeNow()3600:

    A[int(log[j].IP)]A[int(log[j].IP)]1

    jj+1


AccessedLastHour(IP)

return A[int(IP)]> 0


Asymptotics

UpdateAccessListis O(1)per log line

AccessedLastHouris O(1)
But need 232memory even for few IPs

IPv6: 2128won’t fit in memory

In general: O(N)memory, N= |S


List-based Mapping:

Direct addressing requires too much memory

Let’s store only active IPs
Store them in a list
Store only last occurrence of each IP 

Keep the order of occurrence 


UpdateAccessList(log, i, L)

while log[i].timeNow():

    log_lineL.FindByIP(log[i].IP)

    if log_line=NULL:

        L.Erase(log_line)

    L.Append(log[i])

    ii+1

while L.Top().timeNow()3600:

    L.Pop()


AccessedLastHour(IP, L)

return L.FindByIP(IP)=NULL


Asymptotics

n is number of active IPs
Memory usage is
Θ(n)
L.Append,L.Top,L.Popare Θ(1)

L.Findand L.Eraseare Θ(n)

UpdateAccessListis Θ(n)per log line

AccessedLastHouris Θ(n


Encoding IPs

Encode IPs with small numbers
I.e. numbers from 0 to 999
Different codes for currently active IPs 


Hash Function

De nition

For any set of objectsS and any integer
m >0, a function h: S→ {0,1,...,m1}is called a hash function.

De nition

m is called thecardinality of hash function h.


Desirable Properties

h should be fast to compute.

Different values for different objects.

Direct addressing withO(m)memory.

Want small cardinalitym.

Impossible to have all different values ifnumber of objects|S|is more than m.


Collisions

De nition

When h(o1) = h(o2)and o1̸=o2, this is acollision.



Map

Store mapping from objects to other objects:

   Filename location of the file on disk

   Student ID student name
   Contact name
contact phone number

Definition

Map from S to V is a data structure with methodsHasKey(O),Get(O),Set(O,v),whereO S,vV.


h :S → {0,1, . . . ,m 1}
O,OS
v
,vV
A
array ofm lists (chains) of pairs(O,v)

HasKey(O)

L A[h(O)]
for (O,v)in L:

  if O== O:

     return true

return false


Get(O)

L A[h(O)]
for (O,v)in L:

   if O== O:

      return v

return n/a


Set(O,v)

L A[h(O)]

for pin L:

   if p.O== O:

      p.vv

      return 

L.Append(O, v)


Set

De nition

Set is a data structure with methodsAdd(O),Remove(O),Find(O).

Examples

IPs accessed during last hourStudents on campus
Keywords in a programming language


h :S → {0,1, . . . ,m 1}
O,OS
A
array ofm lists (chains) of objectsO

Find(O)

L A[h(O)]

for Oin L:

  if O== O:

    return true

return false


Add(O)

L A[h(O)]

for Oin L:

  if O== O:

    return

L.Append(O)


Remove(O)

if not Find(O):

  return

L A[h(O)]

L.Erase(O)


Hash Table:

Definition

An implementation of a set or a map usinghashing is called a hash table.


Programming Language:

Set:
  unordered_set inC++

  HashSet in Java
  set in Python

Map:
  unordered_map inC++

  HashMap in Java
  dict in Python 

 

Conclusion

Chaining is a technique to implement ahash table

Memory consumption isO(n+ m)

Operations work in timeO(c+ 1)

How to make bothm andc small? 




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值