哈希原理&实现

lisanndesu

于 2024-08-21 20:55:50 发布

阅读量359

点赞数 3

分类专栏： C/C++语法数据结构文章标签：散列表数据结构哈希算法

本文链接：https://blog.csdn.net/li_peixiansang/article/details/141390520

版权

C/C++语法同时被 2 个专栏收录

18 篇文章 0 订阅

订阅专栏

数据结构

3 篇文章 0 订阅

订阅专栏

本节主要看源代码实现

哈希特点

哈希（Hashing）是一种将数据映射到固定大小的表中以实现快速查找的数据结构和算法方法。哈希的主要特点包括：

1. 高效的查找、插入和删除

时间复杂度：哈希表通常提供近乎常数时间的查找、插入和删除操作，理想情况下为 ( O(1) )。
性能：由于通过哈希函数直接计算位置，数据可以迅速定位，减少了需要遍历的数据量。

2. 哈希函数

定义：哈希函数是将输入数据（键）映射到哈希表中的一个位置的函数。
特点：
- 确定性：相同的输入始终会映射到相同的位置。
- 均匀分布：理想的哈希函数将输入数据均匀分布到哈希表的各个槽位，减少冲突的可能性。
- 快速计算：哈希函数应当能够快速计算，以避免对性能的影响。

3. 冲突处理

冲突：当两个或更多的键被映射到哈希表的同一位置时，称为冲突。
处理方法：
- 链式地址法（Separate Chaining）：使用链表或其他数据结构处理同一位置的多个键。
- 开放地址法（Open Addressing）：通过探测策略（如线性探测、二次探测、双重哈希）寻找其他空槽位。

4. 负载因子

定义：负载因子是哈希表中元素数量与表大小的比率。
影响：
- 性能：负载因子较高时，冲突概率增加，操作性能可能下降。
- 动态调整：通常会在负载因子超过某个阈值时，进行哈希表的扩展和再哈希操作，以保持性能。

开放寻址法

概念：所有元素都存储在哈希表的数组中，冲突发生时，使用探测序列寻找下一个可用位置。

hash节点：

	enum State
	{
		EXIST,
		EMPTY,
		DELETE
	};

	template<class K, class V>
	struct HashData
	{
		pair<K, V> _kv;
		State _state = EMPTY;
	};

扩容代码

负载因子保证哈希表永远有剩余空间：负载因子过大则扩容

因子过大，哈希表扩容时，不能只是改变vector.size（会改变映射关系）

应新申请更大size的hashmap，后交换this&newhasnmap

// 扩容
			if (_n * 10 / _tables.size() >= 7)
			{
				//vector<HashData<K, V>> newTables(_tables.size() * 2);
				 遍历旧表， 将所有数据映射到新表
				 ...
				//_tables.swap(newTables);

				HashTable<K, V, Hash> newHT;
				newHT._tables.resize(_tables.size() * 2);
				for (size_t i = 0; i < _tables.size(); i++)
				{
					if (_tables[i]._state == EXIST)
					{
						newHT.Insert(_tables[i]._kv);  //方法复用， 自动正确插入
					}
				}

				_tables.swap(newHT._tables);   //nb swap
			}

key不是int：先将key映射成int

自定义仿函数

struct StringHashFunc
	{
		size_t operator()(const string& s)
		{
			size_t hash = 0;
			for (auto e : s)
			{
				hash *= 31;
				hash += e;
			}

			return hash;
		}
	};

特化模板

（可默认调用，省去传递仿函数类）

template<class K>
struct HashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

// 特化
template<>
struct HashFunc<string>
{
	size_t operator()(const string& key)
	{
		size_t hash = 0;
		for (auto e : key)
		{
			hash *= 31;
			hash += e;
		}

		return hash;
	}
};

对应默认调用

//当k为string时， Hash默认为特化后的类
template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
	public:
		HashTable()
		{
			_tables.resize(10);
		}
        ....
    }

拉链法：（哈希桶）

库里方法就是哈希桶实现哈希表

每个vector节点存一个链表头结点

单向链表，插入方便（头插）

再优化的话，可把节点换为map地址，防止一条链表过长影响效率

扩容时直接插入原节点，防重复删除又申请相同节点

扩容代码

// 负载因子==1扩容
			if (_n == _tables.size())
			{
                //重复delete & new， 不如直接转移
				/*HashTable<K, V> newHT;
				newHT._tables.resize(_tables.size() * 2);
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					while(cur)
					{
						newHT.Insert(cur->_kv);
						cur = cur->_next;
					}
				}

				_tables.swap(newHT._tables);*/
                
                // 推荐写法：将原有各节点转移到new的hashmap， 省去了delete&new
				vector<Node*> newtables(_tables.size() * 2, nullptr);
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					while (cur)
					{
						Node* next = cur->_next;

						// 旧表中节点，挪动新表重新映射的位置
						size_t hashi = hs(cur->_kv.first) % newtables.size();
						// 头插到新表
						cur->_next = newtables[hashi];
						newtables[hashi] = cur;
						
						cur = next;
					}

					_tables[i] = nullptr;
				}

				_tables.swap(newtables);
			}

类模板特化

template<class K>
struct HashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

// 特化
template<>
struct HashFunc<string>
{
	size_t operator()(const string& key)
	{
		size_t hash = 0;
		for (auto e : key)
		{
			hash *= 31;
			hash += e;
		}

		return hash;
	}
};