Intro
The concept, usage and implementations of Hash table are always used in Software Engineer interviews. From the interview guidance of Google, there is an requirement of hash table. It is said "Hashtables: Arguably the single most important data structure known to mankind." There is indeed a bunch of knowledge and techniques for hashtables (hash function, collision, etc.), but from the interview perspective, it is not possible to test the thorough and complete skills of hashtables in a short interview. Take this advantage, in this post, I'd like to learn the basics of hash tables, and try to implement sample code.
What is Hash Table?
It is a very common but often occurred question in IT interviews. I generalize the concept in my own words: " Hash table, is a
data structure, which stores
key-value pairs, the access of value by key can be
O(1) time, a hash function is used to map the key to the index of the value."
You can find many many definitions of hash table, generally speaking, you can imagine hash table is an array, originally we access an element in array by using index, e.g. A[1], A[2]. However in hash table, we access
element by the
key, e.g. A["Monday"], D["Marry"]. The great advantage of it is the speed to look up an element (O(1) time).
How does Hash table works
Firstly, hash tables can be implemented based on many data structures, e.g. Linked list, array and linked list, binary search tree, etc. The idea is to store the <key, value> pair and build a way to access it. For better understanding, just consider an array, we put the <key, value> in a specific order. The way to locate the <key, value> using the key is called hashing. We can consider a hash function takes the key as the input, and output the location of the <key, value> in the array. A simple hash function is to used "mod" operation. Use the "key mod array size" to get the hash, the index of the desired value.
An example
Let's see a simple example.
We have a storage of size 5:
idx key value
0 -1 0
1 -1 0
2 -1 0
3 -1 0
4 -1 0
key=-1 means the slot is empty.
The hash function is hash(key) = key % 5;
First we insert <12, 12> (first is key, second is value)
Compute the hash(12) = 2;
Store the <key, value> into the storage of idx 2.
idx key value
0 -1 0
1 -1 0
2 12 12
3 -1 0
4 -1 0
Next we insert <29,29>, hash(29)=4;
idx key value
0 -1 0
1 -1 0
2 12 12
3 -1 0
4 29 29
Then we insert <27,27>, where the hash code is 2. When we check the location 2, it is already in use.
It is called a
collision, where different key are mapped into same hash code. To deal with the collision, there are many methods, such as, chaining (use a linked list for each location), and rehashing (second function is used to map to another location). Usually we need to know at least these two kinds of methods.
Here we use the rehashing.
The rehashing function is: rehash(key) = (key+1)%5;
So, continue the above step, rehash(2) = 3; location 3 is empty, then store the <27,27> to location 3.
idx key value
0 -1 0
1 -1 0
2 12 12
3 27 27
4 29 29
If we further insert <32,32>, hash(32) = 2; location 2 is in use, rehash(2) = 3, location 3 is also in use,
Then rehash again, rehash(3) = 4, no available, rehash(4) = 0, OK! Store <32, 32 > in 0th slot.
idx key value
0 32 32
1 -1 0
2 12 12
3 27 27
4 29 29
That is the basic way of insert operation for a hash table.
To retrieve the value, e.g. we want to find the value of key <27, ?>, hash(27) = 2, check the key stored in location 2 , which is 12 !=27, then rehashing is need, rehash(2) = 3, the key is 27, then return the value 27.
A simple implementation (in C++)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
|
#include <iostream>
using
namespace
std;
const
int
sz = 5;
struct
data{
int
id;
int
val;
};
class
Hashtable{
data dt[sz];
int
numel;
public
:
Hashtable();
int
hash(
int
&id);
int
rehash(
int
&id);
int
insert(data &d);
int
remove
(data &d);
int
retrieve(
int
&id);
void
output();
};
Hashtable::Hashtable(){
for
(
int
i=0;i<sz;i++){
dt[i].id = -1;
dt[i].val = 0;
}
numel = 0;
}
int
Hashtable::hash(
int
&id){
return
id%sz;
}
int
Hashtable::rehash(
int
&id){
return
(id+1)%sz;
}
int
Hashtable::insert(data &d){
if
(numel<sz){
int
hashid = hash(d.id);
if
(hashid>=0 && hashid < sz){
if
(dt[hashid].id==-1 || dt[hashid].id==-2){
dt[hashid].id = d.id;
dt[hashid].val = d.val;
numel++;
return
0;
}
else
{
cout <<
"collision! rehashing..."
<<endl;
int
i=0;
while
(i<sz){
hashid = rehash(hashid);
if
(dt[hashid].id==-1 || dt[hashid].id==-2){
dt[hashid].id = d.id;
dt[hashid].val = d.val;
numel++;
return
0;
}
if
(i==sz){
return
-1;}
i++;
}
}
}
}
else
{
return
-1;}
}
int
Hashtable::
remove
(data &d){
int
hashid = hash(d.id);
if
(hashid>=0 && hashid < sz){
if
(dt[hashid].id==d.id){
dt[hashid].id = -2;
dt[hashid].val = 0;
numel--;
return
0;
}
else
{
int
i=0;
while
(i<sz){
hashid = rehash(hashid);
if
(dt[hashid].id==d.id){
dt[hashid].id = -2;
dt[hashid].val = 0;
numel--;
return
0;
}
if
(i==sz){
return
-1;}
i++;
}
}
}
}
int
Hashtable::retrieve(
int
&id){
int
hashid = hash(id);
if
(hashid>=0 && hashid < sz){
if
(dt[hashid].id==id){
return
dt[hashid].val;
}
else
{
int
i=0;
while
(i<sz){
hashid = rehash(hashid);
if
(dt[hashid].id==id){
return
dt[hashid].val;
}
if
(i==sz){
return
0;}
i++;
}
}
}
}
void
Hashtable::output(){
cout <<
"idx id val"
<< endl;
for
(
int
i=0;i<sz;i++){
cout << i <<
" "
<< dt[i].id <<
" "
<< dt[i].val << endl;
}
}
int
main(){
Hashtable hashtable;
data d;
d.id = 27;
d.val = 27;
hashtable.insert(d);
hashtable.output();
d.id = 99;
d.val = 99;
hashtable.insert(d);
hashtable.output();
d.id = 32;
d.val = 32;
hashtable.insert(d);
hashtable.output();
d.id = 77;
d.val = 77;
hashtable.insert(d);
hashtable.output();
//retrieve data
int
id = 77;
int
val = hashtable.retrieve(id);
cout << endl;
cout <<
"Retrieving ... "
<< endl;
cout <<
"hashtable["
<< id<<
"]="
<< val << endl;
cout << endl;
//delete element
d.id = 32;
d.val = 32;
hashtable.
remove
(d);
hashtable.output();
d.id = 77;
d.val = 77;
hashtable.
remove
(d);
hashtable.output();
return
0;
}
|
原文地址如下:
http://yucoding.blogspot.com/2013/08/re-viewhash-table-basics.html