哈希表概念及简单实现

机械狗pp

已于 2022-10-21 10:52:50 修改

阅读量550

点赞数 2

分类专栏：数据结构文章标签：散列表数据结构哈希算法

于 2022-05-30 10:42:53 首次发布

本文链接：https://blog.csdn.net/xd6905/article/details/125040495

版权

数据结构专栏收录该内容

11 篇文章 0 订阅

订阅专栏

1 什么是哈希表

哈希表的数据结构是一个顺序表，在存储内容时通过数组下标与数据值一一映射的方式记录。这里的一一映射关系可以用函数来表示，这样的函数叫做哈希转换函数。

2 哈希冲突

从上可以看出由于可能出现多个值映射到顺序表的同一个位置，这样就导致该数组存储数据位置发生冲突，这样的情况称为哈希冲突。

2.1 冲突解决

2.1.1 重新设计哈希函数

哈希函数包括：直接定制法、除留余数法等多种方法，其实都是根据需要对映射关系作规定，设计一个好的哈希函数可减少哈希冲突。

2.1.2 闭散列（开放地址法）

当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那
么可以把key存放到冲突位置中的“下一个” 空位置中去。

线性探测：从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止。

如下图由于新插入44，按除留余数法映射位置在下标4位置，由于4位置已有数据，就逐个向后探测，一直到下标8位置有了空位就插入数组。

二次探测：从发生冲突的位置开始，按二次函数的序列增加（i*i，i=1，2，3，4....）向后探测，直到寻找到下一个空位置为止。

举个例子如下，14映射下标位置为4，同样也有位置冲突。第一次i=1，增加i*i=1个位置，即下标5的位置，再次发生冲突，继续向后探测。这时i=2，i*i=4,向后4个位置在下标8的位置，该位置没有数据则插入成功。

2.1.3 开散列（拉链法）

对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

2.2 哈希表扩容

散列表的载荷因子定义为: a = 填入表中的元素个数 / 散列表的长度
a是散列表装满程度的标志因子。由于表长是定值，a与“填入表中的元素个数”成正比，所以，a越大，表明填入表中的元素越多，产生冲突的可能性就越大;反之，a越小，标明填入表中的元素越少，产生冲突的可能性就越小。实际上，散列表的平均查找长度是载荷因子a的函数，只是不同处理冲突的方法有不同的函数。

对于开放定址法，荷载因子是特别重要因素，应严格限制在0.7-0. 8以下。超过0. 8，查表时的CPU缓存不命中(cachemissing)按照指数曲线上升。在载荷因子过高时应对哈希表扩容。

对于拉链法，最好的情况是：每个哈希桶中刚好挂一个节点，再继续插入元素时，每一次都会发生哈希冲突，因此，在元素个数刚好等于桶的个数时，可以给哈希表增容。

3 代码实现

3.1闭散列

该哈希表的实现，哈希函数使用除留余数法，探测方法为一次探测。载荷因子阈值为0.7，使用vector作为数据结构，初始容量设置为10。

字符串类型由于不是一个整数无法使用除留余数法映射到数组上，于是使用Hashfunc仿函数处理。在哈希表中的数组每个位置设置三个状态（空、已删除、满）方便后面的插入删除等实现。私有成员变量不是指哈希表的大小，而是表中的有效数据个数，哈希表的大小使用_ht.size()来取到。

#pragma once
#include <vector>
#include <string>
namespace Close_Hash
{
	template<class K>
	struct Hash
	{
		size_t operator()(const K& key)
		{
			return key;
		}
	};
	template<>
	struct Hash<std::string>
	{
		size_t operator()(const std::string& s)
		{
			size_t res = 0;
			for (auto c : s)
			{
				res *= 31;
				res += c;
			}
			return res;
		}
	};

	enum State { EMPTY, EXIST, DELETE };
	template<class K, class V,class HashFunc = Hash<K>>
	class HashTable
	{
		struct Elem
		{
			std::pair<K, V> _val;
			State _state = EMPTY;
		};
	public:
		// 插入
		bool Insert(const std::pair<K, V>& val);
		// 查找
		std::pair<size_t, bool> Find(const K& key);
		// 删除
		bool Erase(const K& key);
		size_t Size()const
		{
			return _size;
		}
		bool Empty() const
		{
			return _size == 0;
		}
	private:
		std::vector<Elem> _ht;
		size_t _size;
	};
}

3.1.1 查找

计算出映射值，依照数组存储值和状态值按照探测方式向后查找，while循环中的条件不能是_ht[index]._state == EXIT,这样就会在状态是DELETE数据时就跳出循环，然而其实还是要向后寻找的。

		std::pair<size_t, bool> Find(const K& key)
		{
			if (_ht.empty())
				return std::make_pair(0,false);
			Hash<K> hs;
			size_t start = hs(key) % _ht.size();
			size_t i = 0;
			size_t index = start;
			while (_ht[index]._state != EMPTY)
			{
				if (_ht[index]._val.first == key
					&& _ht[index]._state == EXIST)
					return std::make_pair(index, true);
				i++;
				index = start + i;
				index %= _ht.size();
			}
			return std::make_pair(0, false);
		}

3.1.2 插入

插入中有一个扩容的问题，扩容以后的映射关系发生改变。通过创建临时的哈希表，然后递归调用插入函数，最后将临时的哈希表与当前表交换即可完成哈希表的重新映射。在插入上找到正确的位置插入，同时更新表中有效个数_size即可。

		bool Insert(const std::pair<K, V>& val)
		{
			
			if (Find(val.first).second)
			{
				return false;
			}
			if (_ht.size() == 0 || _size * 10 / _ht.size() > 7)
			{
				//扩容
				int newsize = _ht.size() == 0 ? 10 : _ht.size() * 2;
				HashTable<K, V> newTable;
				newTable._ht.resize(newsize);
				//重新装入
				for (int i = 0; i < _ht.size(); i++)
				{
					if(_ht[i]._state == EXIST)
						newTable.Insert(_ht[i]._val);
				}
				_ht.swap(newTable._ht);
			}
			Hash<K> hs;
			size_t start = hs(val.first) %  _ht.size();
			size_t i = 0;
			size_t index = start;
			while (_ht[index]._state == EXIST)
			{
				i++;
				index = start + i;
				index %= _ht.size();
			}
			//找到空位置了
			_ht[index]._val = val;
			_ht[index]._state = EXIST;
			++_size;
			return true;
		}

3.1.3 删除

删除就简单了，找到位置将该位置状态置为已删除即可，另外不要忘记更新_size。

		bool Erase(const K& key)
		{
			std::pair<size_t,bool> FindOut = Find(key);
			if (FindOut.second == false)
			{
				return false;
			}
			_ht[FindOut.first]._state = DELETE;
			--_size;
			return true;
		}

3.2 开散列

开散列中就基本上都是链表的操作了。

#pragma once

#include <vector>
#include <string>
namespace Open_Hash
{
	template<class K>
	struct Hash
	{
		size_t operator()(const K& key)
		{
			return key;
		}
	};
	template<>
	struct Hash<std::string>
	{
		size_t operator()(const std::string& s)
		{
			size_t res = 0;
			for (auto c : s)
			{
				res *= 31;
				res += c;
			}
			return res;
		}
	};

	template<class K, class V, class HashFunc = Hash<K>>
	class HashTable
	{
		struct Node
		{
			Node(std::pair<K, V> kv)
				:_kv(kv)
				,next(nullptr)
			{}
			std::pair<K, V> _kv;
			Node* next;
		};
	public:
		// 插入
		bool Insert(const std::pair<K, V>& kv);
		// 查找
		Node* Find(const K& key);
		// 删除
		bool Erase(const K& key);
		size_t Size()const
		{
			return _size;
		}
		bool Empty() const
		{
			return _size == 0;
		}
	private:
		std::vector<Node*> _ht;
		size_t _size = 0;//有效数据的个数
	};
}

3.2.1 查找

计算映射下标，再从链表头开始查找数据。

		Node* Find(const K& key)
		{
			if (_ht.empty())
				return nullptr;
			Hash<K> hs;
			size_t index = hs(key) % _ht.size();
			Node* cur = _ht[index];
			while (cur)
			{
				if (cur->_kv.first == key)
					return cur;
				cur = cur->next;
			}
			return nullptr;
		}

3.2.2 插入

由于一个哈希桶中的数据没有顺序要求，使用头插效率高一些。在扩容上使用临时数组，把当前数组上挂的节点一个个头插到临时数组上，最后两个数组交换得到需要的数组。

		bool Insert(const std::pair<K, V>& kv)
		{
			if (Find(kv.first))
			{
				return false;
			}
			if (_ht.size() == _size)//负载因子到1
			{
				//扩容
				int newsize = _ht.size() == 0 ? 10 : _ht.size() * 2;
				std::vector<Node*> newTable;
				newTable.resize(newsize);
				//重新装入
				for (int i = 0; i < _ht.size(); i++)
				{
					Node* cur = _ht[i];
					while (cur)
					{
						Node* next = cur->next;
						Hash<K> hs;
						size_t index = hs(_ht[i]->_kv.first) % newTable.size();
						cur->next = newTable[index];
						newTable[index] = cur;

						cur = next;
					}
					_ht[i] = nullptr;
				}
				_ht.swap(newTable);
			}
			Hash<K> hs;
			size_t index = hs(kv.first) % _ht.size();
			Node* newNode = new Node(kv);
			newNode->next = _ht[index];
			_ht[index] = newNode;
			++_size;
			return true;
		}

3.2.3 删除

找到数据节点直接删除即可，链表操作。

		bool Erase(const K& key)
		{
			if (_ht.empty())
				return false;
			Hash<K> hs;
			size_t index = hs(key) % _ht.size();
			Node* cur = _ht[index];
			Node* pre = nullptr;
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					if (pre == nullptr)//头删
						_ht[index] = cur->next;
					else
						pre->next = cur->next;
					delete cur;
					--_size;
					return true;
				}
				pre = cur;
				cur = cur->next;
			}
			return false;
		}

机械狗pp

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
哈希表概念及简单实现

目录1 什么是哈希表2 哈希冲突2.1 冲突解决2.1.1 重新设计哈希函数2.1.2 闭散列（开放地址法）2.1.3 开散列（拉链法）2.2 哈希表扩容3 代码实现3.1闭散列3.1.1 查找3.1.2 插入3.1.3 删除3.2 开散列3.2.1 查找3.2.2 插入3.2.3 删除1 什么是哈希表哈希表的数据结构是一个顺序表，在存储内容时通过数组下标与数据值一一映射的方式记录。这里的一一映射关系可以用函数来表示，这样的函数叫.
复制链接

扫一扫