本次实验会根据一段文本,利用按字母、按发音、以及折叠的三种方式实现三个Hash表,并且计算各自的ASL(平均搜索长度)和搜索次数
代码链接:Hash
Hash表(散列表)
哈希表(Hash Table),也称为散列表,是一种数据结构,用于实现键-值对的映射关系。它通过哈希函数将关键字(键)映射到数组的索引(位置),从而实现高效的数据查找和插入操作。哈希表在很多编程任务中都得到广泛的应用,如数据库索引、缓存实现、字典等。
哈希表的特点和工作原理如下:
- 哈希函数:哈希表的核心是哈希函数,它将任意大小的输入映射为固定大小的输出(通常是一个整数,本次实现一个单词到一个整数的映射),作为数组的索引。好的哈希函数应该将不同的输入均匀地映射到不同的索引,以减少冲突
- 冲突处理:由于哈希函数的输出是有限的,不同的输入可能会映射到相同的索引,这就是冲突。哈希表需要一种方法来处理冲突,常用的方法有开放定址法、链地址法和双散列法等
- 数组存储:哈希表通常使用数组来存储数据,每个索引对应一个桶(bucket),每个桶可以存储一个或多个键-值对
- 高效查找和插入:通过哈希函数计算出索引,查找和插入操作的时间复杂度通常是常数时间,即O(1)。这是因为在没有冲突的情况下,直接访问数组中的索引就能够找到对应的值
哈希表的应用十分广泛,它可以用来实现集合、字典、缓存等各种数据结构。哈希表在实践中通常能够提供快速的数据查找和插入,但是需要根据实际应用场景选择合适的哈希函数和冲突处理方法,以及考虑内存管理和扩展等问题。
实现方法
由于测试文本较短,因此实现的Hash数组长度均为5000,在计算出Hash值后需要进行一个模5000的运算。
There's something down there. It's Gollum.
Gollum?
He's been following us for three days.
He escaped the dungeons of Baraddur?
Escaped, or was set loose.Now the Ring has brought him here.He will never be rid of his need for it.He hates and loves the Ring, as he hates and loves himself. Smeagol's life is a sad story.Yes, Smeagol he was once called. Before the Ring found him.Before it drove him mad.
It's a pity Bilbo didn't kill him when he had the chance.
Pity? It is pity that stayed Bilbo's hand. Many that live deserve death. Some that die deserve life.Can you give it to them, Frodo?Do not be too eager to deal out death and judgment.Even the very wise can not see all ends.My heart tells me that Gollum has some part to play yet, for good or ill,before this is over.The pity of Bilbo may rule the fate of many.
I wish the Ring had never come to me.I wish none of this had happened.
So do all who live to see such times. But that is not for them to decide.All we have to decide is what to do with the time that is given to us.There are other forces at work in this world, Frodo, besides the will of evil.Bilbo was meant to find the Ring.In which case, you also were meant to have it.
本次实现了三种Hash函数:
按照字母字典序的Hash函数
这种方法思路十分简单,即按照字母的顺序和权重数组G[i]相乘,即可得到Hash值。数组G是质数31的各阶指数。
int G[7] = { 1,31, 31 * 31, 31 * 31 * 31, 31 * 31 * 31 * 31, 31 * 31 * 31 * 31 * 31, 31 * 31 * 31 * 31 * 31 * 31 };
int Hash1(string a) {
int tempKey = 0;
string temp = a;
for (int i = 0; i < 6 && i < a.size(); i++) {//只取前6个字母进行计算
if (a[i] >= 'a' && a[i] <= 'z') {
tempKey += int(a[i] - 'a') * G[i];
}
else if (a[i] >= 'A' && a[i] <= 'Z') {
tempKey += int(a[i] - 'A') * G[i];
}
else {
return tempKey;
}
}
return tempKey;
}
按照发音的Hash函数
这是一种比较新的思路,效果可能会好也可能不及传统方法,因为其可能会导致发音相似的单词冲突.
实现该方法的思路时,按照英文中辅音+元音的搭配计算出Hash值,此处我的计算方式较为简单,可以用更复杂的计算方式细化。
int boolVowel(int key) { //用于判断是否是元音a e i o u,以及y有时做元音
if (key == 0 || key == 4 || key == 8 || key == 14 || key == 20 || key == 24)
return 1;
else
return 0;
}
int Hash2(string a) {
int tempKey = 0;
string temp = a;
for (int i = 0; i < a.size(); i++) {
if (a[i] >= 'a' && a[i] <= 'z') {
a[i] -= 97;
}
else if (a[i] >= 'A' && a[i] <= 'Z') {
a[i] -= 65;
}
}
int tempnumber = 0;
int j = 0;
for (int i = 0; i < a.size(); i++) {
if (!boolVowel(int(a[i]))) {
tempnumber += a[i]; //非元音,则先加上该辅音的字典序
}
else {
tempKey += (G[++j] * (int(a[i])) + tempnumber);//这是我构思的一种元音和辅音结合计算Hash值的函数算法
tempnumber = 0;
if (j >= 5) {
return tempKey;
}
}
}
return tempKey;
}
按照折叠的方式计算Hash值
这是教材上讲述过的一种算法,先将单词折叠后再计算Hash值
int fold(string a) {
int tempkey = 0;
int foldLen = 4; //按4的长度进行折叠
string tmp;
while (true) {
if (a.length() < foldLen)
{
int j = 1;
for (int i = a.length() - 1; i >= 0; i--)
{
if (a[i] >= 'a' && a[i] <= 'z')
tempkey += int(a[i] - 'a') * j;
j *= 10;
}
break;
}
else
{
int j = 1000;
tmp = a.substr(0, 4);//依次折叠
a.erase(0, 4);
for (int i = 0; i < 4; i++)
{
if (tmp[i] >= 'a' && tmp[i] <= 'z')
tempkey += int(tmp[i] - 'a') * j;
j /= 10;
}
}
}
return tempkey % 5000;
}
执行结果
先按照上文的文本构建好Hash表,然后利用test.txt进行检测,test文本在原文本后随便加入若干单词:
test.txt:
There's something down there. It's Gollum.
Gollum?
He's been following us for three days.
He escaped the dungeons of Baraddur?
Escaped, or was set loose.Now the Ring has brought him here.He will never be rid of his need for it.He hates and loves the Ring, as he hates and loves himself. Smeagol's life is a sad story.Yes, Smeagol he was once called. Before the Ring found him.Before it drove him mad.
It's a pity Bilbo didn't kill him when he had the chance.
Pity? It is pity that stayed Bilbo's hand. Many that live deserve death. Some that die deserve life.Can you give it to them, Frodo?Do not be too eager to deal out death and judgment.Even the very wise can not see all ends.My heart tells me that Gollum has some part to play yet, for good or ill,before this is over.The pity of Bilbo may rule the fate of many.
I wish the Ring had never come to me.I wish none of this had happened.
So do all who live to see such times. But that is not for them to decide.All we have to decide is what to do with the time that is given to us.There are other forces at work in this world, Frodo, besides the will of evil.Bilbo was meant to find the Ring.In which case, you also were meant to have it.
apple
day
fds
dsfd
rat
运行结果
-
第一种方法:
可见发生冲突的次数较少 -
第二种方法:
发现冲突次数明显变多,这是由于发音的类似导致的后果,并且我根据发音计算Hash值的算法有所欠缺,没有考虑到尾部的辅音等等,导致这样的方法表现不好,例如the和them就会发生冲突:
- 第三种方法表现比较优秀,折叠之后再计算Hash值没有发生冲突的情况: