数据结构之哈希表(4)

本文详细介绍了哈希表的概念,强调其在搜索速度上的优势以及可能面临的困难,如数组扩容和无序访问。通过具体的应用示例展示了哈希表在编译器中的作用。哈希过程包括哈希码生成、哈希函数的设计以及冲突解决。在Java中,哈希码的实现涉及到字符到数字的转换,并解释了为何选择31作为乘数。此外,讨论了数组大小应为素数的原因,以及如何设计和处理哈希函数。最后,探讨了冲突解决的链接法和开放地址法,以及加载因子在优化哈希表性能中的作用。
摘要由CSDN通过智能技术生成

1 Concept of Hashing

  The problem at hands is to speed up searching.We could search even faster if we know in advance the index at which that value is located in the array. Suppose we do have that magic function that would tell us the index for a given value. With this magic function our search is reduced to just one probe, giving us a constant runtime O(1). Such a function is called a hash function , such data sturcture is called hash (table).  A hash function hashes (converts) a number in a large range into a number in a smaller range. This smaller range corresponds  to the index numbers in an array. An array into which data is inserted using a hash function is called a hash table. 


Hash tables are significantly faster than trees, insertion and searching (and sometimes deletion) can take close to constant time: O(1) in big O notation.



Hash table disadvantage:

1)Hash tables are based on arrays, and arrays are difficult to expand after they’ve been created. For some kinds of hash tables, performance may degrade catastrophically when a table becomes too full, so the programmer needs to have a fairly accurate idea of how many data items will need to be stored (or be prepared to periodically transfer data to a larger hash table ( rehash), a time-consuming process).

2)There’s no convenient way to visit the items in a hash table in any kind of order (such as from smallest to largest). If you need this capability, you’ll need to look elsewhere.

2 Use example

  A similar widely used application for hash tables is in computer-language compilers, which maintain a symbol table in a hash table. The symbol table holds all the variable and function names made up by the programmer, along with the address where they can be found in memory. The program needs to access these names very quickly, so a hash table is the preferred data structure.

3 Hash Process

The following figure describes the process of hash : 


The process:

1) Hash code: If keys are not digit, use hash code to covert keys into digit keys;

2)Hash function: hash (converts) a number in a large range into a number in a smaller range;

3)Hash Table: This smaller range corresponds  to the index numbers in an array. An array into which data is inserted using a hash function is called a hash table.

Pseudocode:

//hash code
digitKey=hashCode(key);

//hash fuction
hashValue=hashFunction(digitKey); //hash the digit key

//Insertion
hashTable[hashValue].insert(key); //use hash table index(hashValue) to insert key

//Delete
hashTable[hashValue].delete(key); //insert at hash table

//Search
key= hashArray[hashValue].find(key);  // get key

From the process of hash table, the following questions should solve:

1) How to implement a hash code ?

2) What is the size of the array(hash table is an array)?

3) How to implement a hash function?

4)How to solve the conflict if two keys has the same hash value?

4 How to implement a hash code?

  Hash code is the function which convert non-digit key to digit key.  If the key is not digit, how can we convert the key to digit key?

  In Java language world, a non-digit key is usually a string object. At first, we look at how digits come from. Like 324, we can write 324=3*10^2+2*10^1+4*10^0(the base is 10 in mathematics). Would a string can write like this? Certainly, if every char in a string object can equal to  a digit, a string can be written in the same way. Luckily, a char equals a corresponding int the ASCII code which a is 97, b is 98, and so on, up to 122 for z. For example, abc=97*10^2+98*10^1+99*10^0.

  We have solved the way how non-digit key transforms in digit key, but what is the base if we use the digit key in hash table? In Java, this base is 31. Now here is the Java code of the hash code:

	/**
	 * Returns a hash code for this string. The hash code for a String object is computed as
	 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
	 * using int arithmetic, where s[i] is the i th character of the string,
	 * n is the length of the string, and ^ indicates exponentiation.
	 * @param key the string object
	 * @return a hash code value for the string object.
	 */
	public int hashCode1(String key){
		int digitKey=0;
		int power31=1;                       //the power
		
		for(int i=key.length()-1;i>=0;i--){  //right to left
			digitKey+=key.charAt(i)* power31;
			power31*=31;
		}//end for
		
		return digitKey;
	} //hashCode1()

Problem:

The hashCode() method is not as efficient as it might be. There are two multiplications and an addition inside the loop. We can eliminate a multiplication by taking advantage of a mathematical identity called Horner’s method(Horner规则). (Horner was an English mathematician, 1773–1827.) This states that an expression like

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

can be written as

((s[n-1]*31+s[n-2])*31+s[n-3])*31+ ...+ s[0]

So we have the following code

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值