哈希表和字符串哈希算法

buaichifanqie

于 2024-09-30 23:19:29 发布

阅读量943

点赞数 33

文章标签：散列表哈希算法数据结构 c++ 算法字符串哈希

本文链接：https://blog.csdn.net/buaichifanqie/article/details/142645122

版权

哈希

哈希表（Hash Table）是一种数据结构，它可以通过一个哈希函数将键（key）映射到存储位置，从而实现高效的数据查找、插入和删除操作。哈希表的特点是能够在常数时间（O(1)）内完成查找和更新，前提是哈希冲突处理得当。

哈希表的基本结构

数组：哈希表的底层通常是一个数组，数组中的每个元素称为一个"桶"（bucket），用来存储键值对或单个值。

哈希函数：

哈希函数用于将输入的键（通常是字符串或整数）转换为数组中的索引。例如，hash(key) % array_size 这种简单的方式可以将键转换为数组的有效索引。

处理哈希冲突：

由于哈希函数可能会将不同的键映射到相同的数组位置，这种现象称为哈希冲突。常见的哈希冲突处理方法有两种：

1.链地址法（Separate Chaining）：每个桶中存储一个链表，冲突的元素依次存储在链表中。

2.开放地址法（Open Addressing）：当发生冲突时，通过一定的探测策略寻找下一个空闲的桶。常见的探测方法包括线性探测、二次探测和双重哈希。

C++标准库中提供了几种支持哈希操作的容器，最常用的就是std::unordered_map和std::unordered_set。它们分别实现了基于哈希表的映射和集合。

1. std::unordered_map

std::unordered_map是C++标准模板库（STL）中用于存储键值对的哈希表实现，它的特点是可以快速查找给定键对应的值。

基本用法：

#include <iostream>
#include <unordered_map>

int main() {
    std::unordered_map<std::string, int> hashMap;
    // 插入键值对
    hashMap["apple"] = 5;
    hashMap["banana"] = 3;
    // 查找元素
    if (hashMap.find("apple") != hashMap.end()) {
        std::cout <<hashMap["apple"] << std::endl;
    }
    // 遍历哈希表
    for (const auto& pair : hashMap) {
        std::cout << pair.first << " : " << pair.second << std::endl;
    }
    return 0;
}

主要方法：
insert(): 插入键值对。
find(): 查找指定键是否存在，返回一个迭代器。
operator[]: 通过键直接访问或插入元素。
erase(): 删除指定键值对。

2. std::unordered_set

std::unordered_set是一个只存储唯一元素的容器，内部使用哈希表进行实现。它不存储键值对，只存储元素本身，且每个元素唯一。

基本用法：

#include <iostream>
#include <unordered_set>

int main() {
    std::unordered_set<int> hashSet;

    // 插入元素
    hashSet.insert(10);
    hashSet.insert(20);

    // 查找元素
    if (hashSet.find(10) != hashSet.end()) {
        std::cout  << std::endl;
    }

    // 遍历集合
    for (const auto& elem : hashSet) {
        std::cout << elem << std::endl;
    }

    return 0;
}

主要方法：
insert(): 插入元素。
find(): 查找指定元素是否存在。
erase(): 删除元素。
count(): 检查元素是否存在，返回1表示存在，0表示不存在。

字符串哈希

我们来判断两个字符串是否相等的时候，如果使用一个一个遍历，时间是可呢会超时的，所以我们采用将字符串变成一个数字，数字之间的比较是非常容易的时间复杂度是O(1)。下面我们来看看如何操作，和字符串哈希的概念。
在这里插入图片描述
如果真的还是发生了冲突我们又该怎么解决呢？

计算hash_code代码如下：

#include<iostream>
using namespace std;
typedef unsigned long long ULL;

const int X = 13331;

int main()
{
	string s = "abcde";
	ULL hash_code = s[0];
	ULL flag = 1;
	for (int i = 1; i < s.size(); i++)
	{
		//在十进制中我们是1*100+2*10+3*1；所以这里我们一样
		hash_code = hash_code * X + s[i];
	}
	cout << hash_code;
	return 0;
}

字符串哈希代码实现

#include<iostream>
#include<vector>
using namespace std;
typedef unsigned long long ULL;
const int X = 13331;
vector<ULL>h, x;
void BKDR_hash(string s)
{
	h[0] = s[0];
	x[0] = 1;
	for (int i = 1; i < s.size(); i++)
	{
		h[i] = h[i - 1] * X + s[i];
//假如现在是1234，我们要怎样得到他的字串234呢 我们可以使用前缀和
//   1     12     123     1234
//  h[1]  h[2]    h[3]    h[3]
//234 = h[3]-h[0]*10^3
//34  = h[3]-h[2]*10^2
		x[i] = x[i - 1] * X;
	}
}

ULL get_hash(int left, int right)
{
	if (!left)
	{
		return h[right];
	}
	else
	{
		return h[right] - h[left - 1] * x[right - left + 1];
	}
}

int main()
{
	string s1;
	cin >> s1;
	h.resize(s1.size());
	x.resize(s1.size());
	BKDR_hash(s1);
	string s2;
	cin >> s2;
	ULL hash2 = 0;
	for (int i = 0; i < s2.size(); i++)
	{
		hash2 = hash2 * X + s2[i];
	}
	for (int i = 0; i < s1.size(); i++)
	{
		cout << get_hash(i, min(s1.size() - 1, i + s2.size() - 1)) << " ";
	}
	cout << endl;
	cout << hash2;
	return 0;
}
//运行
//abcde
//cd
//1293205 1306537 1319869 1333201 101
//1319869
//我们可以发现s2的cd和s1中的一个字串是相等的是不是可以把kmp算法也代替，这样就很简单