Hash Table（哈希表）

最新推荐文章于 2023-07-01 17:14:49 发布

JUAN425

最新推荐文章于 2023-07-01 17:14:49 发布

阅读量1.2k

点赞数

哈希表，又称散列表。是一种十分重要的数据结构（data structure）。该数据结构常常用于implement (实现) 一个 associative array（关联式数组）。所谓的associative的意思就是，将keys map(映射为) 到一个value。也就就是只要给出key(键)，就可以查找到该键对应的值（values）。 hash Table 使用的是hash function（哈希函数）去计算出array 的index。只要计算出来后，就可以知道对应的值。

hash Table 的重要的特点是它支持快速的字典操作（就像查字典一样，给出键，查找对应的值）。这些字典操作包括search， insert， Delete。快速到只需要常数时间，具体的如下表：

Q：为什么需要hash Table？

A：因为hash Table 支持quick search。举个例子，如果我们有一个an array of full data(say 100 items)。如果我们知道一个specific的item 在数组中存储的位置（即index），我们就可以快速的存取（access）这个item。例如，我们知道这个元素的位置（index）为3，那我们可以使用如下语句存取它：

myItem = myArray[3];

这样，我们就可以不用search through each element in the array, 我们只需要access position 3.

问题出来了，我们如何知道位置3就存储着我们查找的data？？？

解决的办法就是使用hash。

给定一个key, 我们可以通过使用hash function（哈希函数）h(x) 计算出这个key 对应的值h(key) , 这个值就是我们想要access的数组的index（或者position）。

Q：什么是hash function？？

A：有许多的hash functions。有些hash functions 是take an integer key , 然后计算出index。一个常常被用于integer key 的hash function 就是Division Method（除法散列法）。

下面举个使用除法散列法的例子。

例如下面的几个数字是给定的keys。我们想要将这些keys map into an array of 10 elements。

下面是计算的结果。除数或者模（mod）通常取数组的element的最大数目（the maximum number of elements in the array）。使用这个remainder作为index。

所以上述的数字插入到这个数组的位置为6， 7， 0等。如下：

关于使用division Method的一个重要的特点是键（keys）为integers。

Q：当keys 不是integers的时候，该怎么办呢？？？？

A：出现这种情况的时候，需要做如下两步：

（1）需要一个函数将这个key 转换为 integer

(2) 在使用hash Method 将这个Integer 转换为 array的index。

那么keys不是integers 的意思是什么呢？

举个例子，我们的keys 为人名（ people's names）的时候，例如keys 如下：

现在的目标就是，通过names, 这个name 对应于array的index，从而存取（access）相关的信息。这个hash function 完成如下两个功能:

1. convert names to integers. 也就是将一个string 变为一个integers。例如将string的各个字母的的ASII码值求和：

例如， Sarah Jones对应83（S） + 97(a) + 114(r) + 97(a) + 104(h) + + 32(space) 74(J) + 111(o) + 110(n) + 101(e) + 115(s) = 1038. 等。

2. 使用hash Method 获得index。

接下来使用division Method 除以10 获得相对应的index，如下：

相关的存的位置如下：

上述的这关联array 就是 hash Table。

这个hash Table 存着（stores）这个key（用于计算index的），以及伴随着的（along with）关联的值（ associated values）。

Again, 通过使用key, 以及使用hash function 计算出对应的index，然后将这个item 插入到hash Table中，将会导致碰撞（collision）的问题。

例如，上述例子中，当我们将John Smith 插入到上述的hash Table中的时候，计算如下:
John Smith --> 948 % 10 --> 8

于是John Smith将插入在index为8 的位置。但是由于在8的位置，已经有Sarah Jones元素了，所以发生了碰撞。

解决碰撞（ collision ）的办法主要有两种，如下：

Linear Probing（线性探查）
Chaining（链地址法（链接法））

下面主要介绍chaining 的解决办法。

每个hash Table的表单元中存的是一个链表（即链表头）。即所谓的 chaining ，就是 an array of linked lists. All the data in the "same link", have colliding index values.

回归到上例子中. 由于 Sarah Jones 和John Smith发生了碰撞了。所以我们使用链表将其链接起来。如下 John Smith is "chained" or "linked" after Sarah Jones.

关于hash Table（散列表）的用途。

Applications：

hash table 常常用于当数据量很大时候，能够快速的从大量的data 中实现快速的查找（quick search）以及retrieve（获取）相关的信息等如下情况。使用hash Table的情况（situations）如下:

For driver's license record's. With a hash table, you could quickly get information about the driver (ie. name, address, age) given the licence number（此时license number 作为key）.
For compiler symbol tables. The compiler uses a symbol table to keep track of the user-defined symbols(key) in a C++ program. This allows the compiler to quickly look up attributes associated with symbols (for example, variable names)
For internet search engines.
For telephone book databases. You could make use of a hash table implementatation to quickly look up John Smith's telephone number(此时人名是key, 电话好码是value).
For electronic library catalogs（电子图书馆目录）. Hash Table implementations allow for a fast find （迅速查找）among the millions of materials stored in the library.
For implementing passwords for systems with multiple users（对于一个具有多个用户的密码系统）. Hash Tables allow for a fast retrieval of the password （密码是value）which corresponds to a given username(用户名是key).

关于hash Table的几种典型的operations:

与Hash Table 相关联的operations 如下：

bool isEmpty()
判断Hash 表是否为空， Returns true if the hash table is empty. Otherwise, returns false
bool isFull()
判断Hash Table 是否为满。 Returns true if the hash table is full. Otherwise, returns false
void insert (const DT &newDataItem)
Inserts newDataItem into the appropriate list in the hash table. The location (index) in the hash table is determined by the key and the hash function.
bool remove (KF searchkey)
Searches the hash table for the data item with the key searchKey. If the data item is found, then removes the data item and returns true. Otherwise, returns false.
bool retrieve (KF searchkey, DT &dataItem)
使用 key searchKey去搜索Hash Table中的对应的item. If the data item is found, then copies the data item to dataItem and returns true. Otherwise, returns false.
void clear()
清除Hash table 中的所有的数据.
void showStructure()
输出显示Hash Table中存储的所有data item. 如果我们的Hash Table为empty, 输出, "Empty hash table". This is meant for testing/debugging purposes.

Application: Looking up password(查找密码):

The following section outlines an algorithm for authenticating a user's password（认证用户的密码）. Later, in the lab exercise, you will be given the skeleton code and asked to add lines to make it work.

hash table的一个用处就是被用来存储计算机用户的登陆用户名（ login usernames）以及对应的密码（ passwords.）。

该Program 主要有两大steps, 如下:

根据数据，创建Hash Table。首先 program 将会从文件password.dat加载（load） username/password 数据集合，然后将passwords 插入到Hash Table中存储。其中中username 作为keys。直至到达 the end of file（文件password.dat）的时候，才停止插入. 文件password.dat 的内容形式如下。每一行对应着一个 username/password 集合:

2. 改程序会显示一个输入提示符，即 login prompt, 然后读入 username, 并且显示一个输入密码的提示符，即 password prompt, 用户输入密码后，程序开始根据username 作为key, 查找（look up, 像z查找字典一样）对应的密码，然后比对，如果用户输入的密码和Hash Table中这个用户名对应的value相同，则输入正确。然后打印出 "Authentication successful" ，否则密码输入错误，就会打印出 "Authentication failure". 具体的形式如下：

Step 2 将会一直循环下去。直至用户输入一个EOF符号（the end of the input data (EOF) ），循环结束。 EOF符号在PC中的的输入方法是：.CTRL Z character.

当名字作为keys　的时候，　我们想办法将 string 转换为 an integer, 我们采用的方法是将string的每一个character的ASCII码加起来。. 例如, mary的转换的计算方法如下。 :

109('m') + 97('a') + 114('r') + 121('y')=441

实现有string变为ASCII码求和的程序如下:

    int hash(const string str) const
    {
        int val = 0;

        for (int i=0; i<str.length(); i++) 
            val += str[i];
        return val;
    }

将 string 变为一个 integer之后,我们的下一步就是将这个number转换为Hash Table的index。.对于an array of 10 elements ，我们可以将这个数字除以10 取余数。然后使用余数作为Hash Table的index，并用这个内存位置存这个string（人名）的password。. Combining these two hash functions, we will get some code that looks like this:

   int index = dataItem.hash ( searchKey ) % 10;

Therefore mary's index will be:

 441 % 10 = 1.

3. Lab Exercise

Get the files:

Click here

To get the zipped files for this exercise

Extract all of the files to the WORKAREA. Open the WORKAREA and double click on exercise.sln. This will open up the project. There are six files used in this program:
- hashtbl.cpp and hashtble.h -- contain the implementation of the hashtable class
- listlnk.cpp and listlnk.h -- contain the implementation of linked lists class
- login.cpp -- the main program. This contains the Password structure, which is inserted into the hashtable.
- password.dat -- contains all the users and corresponding passwords.
Your primary tasks for this exercise are to edit the login.cpp to add in lines so that it does the following:
2. insert passwords into the Hash Table
3. retrieve one user's Password structure from the Hash Table
4. compare retrieved user password to input password and print "Authentication failure" or "Authentication successful"
Steps include:
Try to run this program. You should find that it will prompt you for "Login:" and "Password:" (type in random words at these prompts). You will notice that it continuely cycles around asking you for this information.
To stop the program from running, at the "Login:" prompt, type CTRL and z (simultaneously) and then the Enter key .
Add in a line to insert passwords into the table. Hint: notice that the name of the hashtable is passwords and that you want to insert a Password structure called tempPass into the hashtable.
Add in a line to print out the hash table. Hint: the hashtable is passwords and there is a member function called showStructure.
Build and Run this program. If all is working well, you should get some output that looks like this:
```
The Hash Table has the following entries
0: _

1: mary

2: _

3: _

4: _

5: bopeep

6: _

7: jill

8: _

9: jack

Login:
```
This shows the hash table that has resulted from inserting data from the password.dat file (mentioned in Section 2). Notice that mary is at index 1, just as we predicted (in Section 2).
Add lines to compare the true password to the input password and print "Authentication failure" or "Authentication successful". Hint: Compare the input password (pass) to the password within thetempPass object (which has been retrieved).

Build and Run your program. You should get results like the following:

Login: mary
Password: contrary
Authentication successful

Login: jim 
Password: contrary
Authentication failure

Login: bopeep
Password: sheeplost
Authentication failure

You now might want to play around with a couple of things and see what happens:
- Modify the following line so that you have 8 elements in your hash table (instead of 10):
```
HashTbl<Password, string> passwords(10);
```
  What happens to the hashtable? Why?
- Edit the password.dat file. This file has been added to the project (under "Resource Files" in the Solution Explorer). Double click on it to open.
  You can add usernames and passwords to test more. Try adding a username "ramy" (this has the same characters as mary, and, therefore, the same integer hash value)