Hash Table - Reviews-CSDN博客

Intro

The concept, usage and implementations of Hash table are always used in Software Engineer interviews. From the interview guidance of Google, there is an requirement of hash table. It is said "Hashtables: Arguably the single most important data structure known to mankind." There is indeed a bunch of knowledge and techniques for hashtables (hash function, collision, etc.), but from the interview perspective, it is not possible to test the thorough and complete skills of hashtables in a short interview. Take this advantage, in this post, I'd like to learn the basics of hash tables, and try to implement sample code.

What is Hash Table?

It is a very common but often occurred question in IT interviews. I generalize the concept in my own words: " Hash table, is a data structure, which stores key-value pairs, the access of value by key can be O(1) time, a hash function is used to map the key to the index of the value."

You can find many many definitions of hash table, generally speaking, you can imagine hash table is an array, originally we access an element in array by using index, e.g. A[1], A[2]. However in hash table, we access element by the key, e.g. A["Monday"], D["Marry"]. The great advantage of it is the speed to look up an element (O(1) time).

How does Hash table works

Firstly, hash tables can be implemented based on many data structures, e.g. Linked list, array and linked list, binary search tree, etc. The idea is to store the <key, value> pair and build a way to access it. For better understanding, just consider an array, we put the <key, value> in a specific order. The way to locate the <key, value> using the key is called hashing. We can consider a hash function takes the key as the input, and output the location of the <key, value> in the array. A simple hash function is to used "mod" operation. Use the "key mod array size" to get the hash, the index of the desired value.

An example

Let's see a simple example.

We have a storage of size 5:

idx key value

0 -1 0

1 -1 0

2 -1 0

3 -1 0

4 -1 0

key=-1 means the slot is empty.

The hash function is hash(key) = key % 5;

First we insert <12, 12> (first is key, second is value)

Compute the hash(12) = 2;

Store the <key, value> into the storage of idx 2.

idx key value

0 -1 0

1 -1 0

2 12 12

3 -1 0

4 -1 0

Next we insert <29,29>, hash(29)=4;

idx key value

0 -1 0

1 -1 0

2 12 12

3 -1 0

4 29 29

Then we insert <27,27>, where the hash code is 2. When we check the location 2, it is already in use.

It is called a collision, where different key are mapped into same hash code. To deal with the collision, there are many methods, such as, chaining (use a linked list for each location), and rehashing (second function is used to map to another location). Usually we need to know at least these two kinds of methods.

Here we use the rehashing.

The rehashing function is: rehash(key) = (key+1)%5;

So, continue the above step, rehash(2) = 3; location 3 is empty, then store the <27,27> to location 3.

idx key value

0 -1 0

1 -1 0

2 12 12

3 27 27

4 29 29

If we further insert <32,32>, hash(32) = 2; location 2 is in use, rehash(2) = 3, location 3 is also in use,

Then rehash again, rehash(3) = 4, no available, rehash(4) = 0, OK! Store <32, 32 > in 0th slot.

idx key value

0 32 32

1 -1 0

2 12 12

3 27 27

4 29 29

That is the basic way of insert operation for a hash table.

To retrieve the value, e.g. we want to find the value of key <27, ?>, hash(27) = 2, check the key stored in location 2 , which is 12 !=27, then rehashing is need, rehash(2) = 3, the key is 27, then return the value 27.

A simple implementation (in C++)

 
        #include <iostream> 
       
        using 
         namespace 
         std; 
       
        const 
         int 
         sz = 5; 
       
        struct 
         data{ 
       
        int 
         id; 
       
        int 
         val; 
       
        }; 
       
        class 
         Hashtable{ 
       
        data dt[sz]; 
       
        int 
         numel; 
       
        public 
        : 
       
        Hashtable(); 
       
        int 
         hash( 
        int 
         &id); 
       
        int 
         rehash( 
        int 
         &id); 
       
        int 
         insert(data &d); 
       
        int 
         remove 
        (data &d); 
       
        int 
         retrieve( 
        int 
         &id);  
       
        void 
         output(); 
       
        }; 
       
        Hashtable::Hashtable(){ 
       
        for 
         ( 
        int 
         i=0;i<sz;i++){ 
       
        dt[i].id = -1; 
       
        dt[i].val = 0; 
       
        } 
       
        numel = 0; 
       
        } 
       
        int 
         Hashtable::hash( 
        int 
         &id){ 
       
        return 
         id%sz; 
       
        } 
       
        int 
         Hashtable::rehash( 
        int 
         &id){ 
       
        return 
         (id+1)%sz; 
       
        } 
       
        int 
         Hashtable::insert(data &d){ 
       
        if 
         (numel<sz){ 
       
        int 
         hashid = hash(d.id); 
       
        if 
         (hashid>=0 && hashid < sz){ 
       
        if 
         (dt[hashid].id==-1 || dt[hashid].id==-2){ 
       
        dt[hashid].id = d.id; 
       
        dt[hashid].val = d.val; 
       
        numel++; 
       
        return 
         0; 
       
        } 
        else 
        { 
       
        cout <<  
        "collision! rehashing..." 
         <<endl; 
       
        int 
         i=0; 
       
        while 
         (i<sz){ 
       
        hashid = rehash(hashid); 
       
        if 
         (dt[hashid].id==-1 || dt[hashid].id==-2){ 
       
        dt[hashid].id = d.id; 
       
        dt[hashid].val = d.val; 
       
        numel++; 
       
        return 
         0; 
       
        } 
       
        if 
         (i==sz){ 
        return 
         -1;} 
       
        i++; 
       
        } 
       
        } 
       
        } 
       
        } 
        else 
        { 
        return 
         -1;} 
       
        } 
       
        int 
         Hashtable:: 
        remove 
        (data &d){ 
       
        int 
         hashid = hash(d.id); 
       
        if 
         (hashid>=0 && hashid < sz){ 
       
        if 
         (dt[hashid].id==d.id){ 
       
        dt[hashid].id = -2; 
       
        dt[hashid].val = 0; 
       
        numel--; 
       
        return 
         0; 
       
        } 
        else 
        { 
       
        int 
         i=0; 
       
        while 
         (i<sz){ 
       
        hashid = rehash(hashid); 
       
        if 
         (dt[hashid].id==d.id){ 
       
        dt[hashid].id = -2; 
       
        dt[hashid].val = 0; 
       
        numel--; 
       
        return 
         0; 
       
        } 
       
        if 
         (i==sz){ 
        return 
         -1;} 
       
        i++; 
       
        } 
       
        } 
       
        } 
       
        } 
       
        int 
         Hashtable::retrieve( 
        int 
         &id){ 
       
        int 
         hashid = hash(id); 
       
        if 
         (hashid>=0 && hashid < sz){ 
       
        if 
         (dt[hashid].id==id){ 
       
        return 
         dt[hashid].val; 
       
        } 
        else 
        { 
       
        int 
         i=0; 
       
        while 
         (i<sz){ 
       
        hashid = rehash(hashid); 
       
        if 
         (dt[hashid].id==id){ 
       
        return 
         dt[hashid].val; 
       
        } 
       
        if 
         (i==sz){ 
        return 
         0;} 
       
        i++; 
       
        } 
       
        } 
       
        } 
       
        } 
       
        void 
         Hashtable::output(){ 
       
        cout <<  
        "idx  id  val" 
         << endl; 
       
        for 
         ( 
        int 
         i=0;i<sz;i++){ 
       
        cout << i <<  
        "    " 
         << dt[i].id <<  
        "    " 
         << dt[i].val << endl;  
       
        } 
       
        } 
       
        int 
         main(){ 
       
        Hashtable hashtable; 
       
        data d; 
       
        d.id = 27; 
       
        d.val = 27; 
       
        hashtable.insert(d); 
       
        hashtable.output(); 
       
        d.id = 99; 
       
        d.val = 99; 
       
        hashtable.insert(d); 
       
        hashtable.output(); 
       
        d.id = 32; 
       
        d.val = 32; 
       
        hashtable.insert(d); 
       
        hashtable.output(); 
       
        d.id = 77; 
       
        d.val = 77; 
       
        hashtable.insert(d); 
       
        hashtable.output(); 
       
        //retrieve data 
       
        int 
         id = 77; 
       
        int 
         val = hashtable.retrieve(id); 
       
        cout << endl; 
       
        cout <<  
        "Retrieving ... " 
         << endl; 
       
        cout <<  
        "hashtable[" 
         << id<<  
        "]=" 
         << val << endl; 
       
        cout << endl; 
       
        //delete element 
       
        d.id = 32; 
       
        d.val = 32; 
       
        hashtable. 
        remove 
        (d); 
       
        hashtable.output(); 
       
        d.id = 77; 
       
        d.val = 77; 
       
        hashtable. 
        remove 
        (d); 
       
        hashtable.output(); 
       
        return 
         0; 
       
        }