1 Concept of Hashing
The problem at hands is to speed up searching.We could search even faster if we know in advance the index at which that value is located in the array. Suppose we do have that magic function that would tell us the index for a given value. With this magic function our search is reduced to just one probe, giving us a constant runtime O(1). Such a function is called a hash function , such data sturcture is called hash (table). A hash function hashes (converts) a number in a large range into a number in a smaller range. This smaller range corresponds to the index numbers in an array. An array into which data is inserted using a hash function is called a hash table.
Hash tables are significantly faster than trees, insertion and searching (and sometimes deletion) can take close to constant time: O(1) in big O notation.
Hash table disadvantage:
1)Hash tables are based on arrays, and arrays are difficult to expand after they’ve been created. For some kinds of hash tables, performance may degrade catastrophically when a table becomes too full, so the programmer needs to have a fairly accurate idea of how many data items will need to be stored (or be prepared to periodically transfer data to a larger hash table ( rehash), a time-consuming process).
2)There’s no convenient way to visit the items in a hash table in any kind of order (such as from smallest to largest). If you need this capability, you’ll need to look elsewhere.
2 Use example
A similar widely used application for hash tables is in computer-language compilers, which maintain a symbol table in a hash table. The symbol table holds all the variable and function names made up by the programmer, along with the address where they can be found in memory. The program needs to access these names very quickly, so a hash table is the preferred data structure.
3 Hash Process
The following figure describes the process of hash :
The process:
1) Hash code: If keys are not digit, use hash code to covert keys into digit keys;
2)Hash function: hash (converts) a number in a large range into a number in a smaller range;
3)Hash Table: This smaller range corresponds to the index numbers in an array. An array into which data is inserted using a hash function is called a hash table.
Pseudocode:
//hash code
digitKey=hashCode(key);
//hash fuction
hashValue=hashFunction(digitKey); //hash the digit key
//Insertion
hashTable[hashValue].insert(key); //use hash table index(hashValue) to insert key
//Delete
hashTable[hashValue].delete(key); //insert at hash table
//Search
key= hashArray[hashValue].find(key); // get key
From the process of hash table, the following questions should solve:
1) How to implement a hash code ?
2) What is the size of the array(hash table is an array)?
3) How to implement a hash function?
4)How to solve the conflict if two keys has the same hash value?
4 How to implement a hash code?
Hash code is the function which convert non-digit key to digit key. If the key is not digit, how can we convert the key to digit key?
In Java language world, a non-digit key is usually a string object. At first, we look at how digits come from. Like 324, we can write 324=3*10^2+2*10^1+4*10^0(the base is 10 in mathematics). Would a string can write like this? Certainly, if every char in a string object can equal to a digit, a string can be written in the same way. Luckily, a char equals a corresponding int the ASCII code which a is 97, b is 98, and so on, up to 122 for z. For example, abc=97*10^2+98*10^1+99*10^0.
We have solved the way how non-digit key transforms in digit key, but what is the base if we use the digit key in hash table? In Java, this base is 31. Now here is the Java code of the hash code:
/**
* Returns a hash code for this string. The hash code for a String object is computed as
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* using int arithmetic, where s[i] is the i th character of the string,
* n is the length of the string, and ^ indicates exponentiation.
* @param key the string object
* @return a hash code value for the string object.
*/
public int hashCode1(String key){
int digitKey=0;
int power31=1; //the power
for(int i=key.length()-1;i>=0;i--){ //right to left
digitKey+=key.charAt(i)* power31;
power31*=31;
}//end for
return digitKey;
} //hashCode1()
Problem:
The hashCode() method is not as efficient as it might be. There are two multiplications and an addition inside the loop. We can eliminate a multiplication by taking advantage of a mathematical identity called Horner’s method(Horner规则). (Horner was an English mathematician, 1773–1827.) This states that an expression like
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
can be written as
((s[n-1]*31+s[n-2])*31+s[n-3])*31+ ...+ s[0]
So we have the following code