C++——哈希

函数声明	功能介绍
begin	返回unordered_map第一个元素的迭代器
end	返回unordered_map最后一个元素下一个位置的迭代器
cbegin	返回unordered_map第一个位置的const迭代器
cend	返回unordered_map最后一个元素下一个位置的const迭代器

2.4 unordered_map的元素访问

函数声明	功能介绍
operator[]	返回与key对应的value，没有一个默认值

2.5 unordered_map的查询

函数声明	功能介绍
iterator find(const K& key)	返回key在哈希桶中的位置
size_t count(const K& key)	返回哈希桶中关键码为key的键值对的个数

2.6 unordered_map的修改操作

函数声明	功能介绍
insert	向容器中插入键值对
erase	删除容器中的键值对
void clear()	清空容器中有效元素个数
void swap(unordered map&)	交换两个容器中的元素

2.7 unordered_map的桶操作

函数声明	功能介绍
size_t bucket_count() const	返回哈希桶中桶的总个数
size_t bucket_size(size_t n) const	返回n号桶中有效元素的个数
size_t bucket(const K& key)	返回元素key所在的桶号

二、底层机构

unordered系列的关联式容器之所以效率比较高，是因为其底层使用了哈希结构。

1、哈希概念

顺序结构以及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须要经过关键码的多次比较。顺序查找的时间复杂度为O(N)，平衡树中为树的高度，即O(log2n)，搜索的效率取决于搜索过程中元素的比较次数。

如果构造一种存储结构，通过某种函数（hashFunc）使元素的存储位置与它的关键码之间能够建立一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

当向该结构中：

·插入元素

根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放

·搜索元素

对元素的关键码进行同样的计算，把求的函数值当作元素的存储位置，在结构中按此位置取元素比较，若关键码相同，则搜索成功

该方式即为哈希方法，哈希方法中使用的转换函数称为哈希函数，构造出来的结构称为哈希表。

例如：数据集合{1, 7, 6, 4, 5, 9};

哈希函数设置为：hash(key) = key % capacity; capacity为存储元素底层空间的总大小

2、哈希冲突

不同关键字通过相同哈希函数计算出相同的哈希地址，该种现象称为哈希冲突。

3、哈希函数

哈希函数设计原则：

·哈希函数的定义域必须包括需要存储的全部关键码，而如果哈希表允许有m个地址时，其值域必须在0到m-1之间

·哈希函数计算出来的地址能均匀分布在整个空间中

·哈希函数应该比较简单

常用的哈希函数

（1）直接定制法

取关键字的某个线性函数为散列地址：Hash (key) = A*key + B

优点：简单，均匀

缺点：需要事先知道关键字的分布情况

使用场景：适合查找比较小且连续的情况

（2）除留余数法

设散列表中允许的地址数为m,取一个不大于m，但最接近或等于m的质数作为除数，按照哈希函数:Hash(key) = key % p(p <= m)，将关键码转换成哈希地址。

4、哈希冲突的解决

4.1 闭散列

闭散列：也叫做开放定址法，当发生哈希冲突时，如果哈希表为被装满，说明在哈希表中必然还有空位，那么可以把key存放到冲突位置中的下一个空位置中去。

线性探测的优点：实现简单

线性探测的缺点：一旦发生哈希冲突，所有的冲突连在一起，容易产生数据堆积，即：不同关键码占据了可利用的空位置，使得寻找某些关键码的位置需要许多次比较，导致搜索效率降低。

4.2 开散列

开散列概念

开散列法又叫链地址法（开链法），首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，每个桶中的元素通过一个单链表链接起来，各链表的头节点存储在哈希表中。

三、哈希表的模拟实现

1、开放定址法（K，V模型）

开放定址法每个格中有三种状态中的一种，EMPTY、EXIST、DELETE

1.1哈希数据的结构体

	template<class K, class V>
	struct HashData
	{
		pair<K, V> _kv;
		State _state = EMPTY;
	};

1.2哈希表中的成员变量

		vector<HashData<K, V>> _tables;
		size_t _n = 0; //存储的数据个数

1.3插入数据

		bool Insert(const pair<K, V>& kv)
		{
			if (Find(kv.first))
				return false;
			//负载因子超过0.7就扩容
			if (_tables.size() == 0 || _n * 10 / _tables.size() >= 7)
			{
				size_t newsize = _tables.size() == 0 ? 10 : _tables.size() * 2;
				HashTable<K, V> newht;
				newht._tables.resize(newsize);

				//遍历旧表，重新映射到新表中
				for (auto& data : _tables)
				{
					if (data._state == EXIST)
					{
						newht.Insert(data._kv);
					}
				}

				_tables.swap(newht._tables);
			}

			size_t hashi = kv.first % _tables.size();

			//线性探测
			size_t i = 1;
			size_t index = hashi;
			while (_tables[index]._state == EXIST)
			{
				index = hashi + i;
				index %= _tables.size();
				++i;
			}

			_tables[hashi]._kv = kv;
			_tables[hashi]._state = EXIST;
			_n++;

			return true;
		}

上面的负载因子是指表中存在的数据占表大小的比例。

当负载因子到达一定时，需要将哈希表扩容，将旧表的数据重新映射到新表中。当发生哈希冲突时，向后移动并查看状态如果时EMPTY就可以插入并把状态改为EXIST。

1.4查找

		HashData<K, V>* Find(const K& key)
		{
			if (_tables.size() == 0)
				return nullptr;

			size_t hashi = key % _tables.size();

			//线性探测
			size_t i = 1;
			size_t index = hashi;
			while (_tables[hashi]._state != EMPTY)
			{
				if (_tables[index]._state == EXIST
					&& _tables[index]._kv.first == key)
				{
					return &_tables[index];
				}
				index = hashi + i;
				index %= _tables.size();
				++i;

				//如果已经查找一圈了，那么说明全是存在+删除
				if (index == hashi)
				{
					break;
				}
			}

			return nullptr;
		}

1.5删除

		bool Erase(const K& key)
		{
			HashData<K, V>* ret = Find(key);
			if (ret)
			{
				ret->_state = DELETE;
				--_n;
				return true;
			}
			else
			{
				return false;
			}
		}

只需要查找到要删除的元素，把该元素的状态设置为DELETE，数据个数--即可。

2、哈希桶的实现

2.1哈希节点的结构体

	template<class T>
	struct HashNode
	{
		HashNode<T>* _next;
		T _data;

		HashNode(const T& data)
			:_next(nullptr)
			, _data(data)
		{}
	};

因为不确定节点的类型所以用模板T。

2.2哈希表的成员变量

		vector<Node*> _tables;
		size_t _n = 0;

2.3迭代器的结构体

	template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
	struct __HashIterator
	{
		typedef HashNode<T> Node;
		typedef HashTable<K, T, KeyOfT, Hash> HT;
		typedef __HashIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;

		typedef __HashIterator<K, T, T&, T*, KeyOfT, Hash> Iterator;

		Node* _node;
		const HT* _ht;

		__HashIterator(Node* node, const HT* ht)
			:_node(node)
			,_ht(ht)
		{}

		__HashIterator(const Iterator& it)
			:_node(it._node)
			,_ht(it._ht)
		{}

		Ref operator*()
		{
			return _node->_data;
		}

		Ptr operator->()
		{
			return &_node->_data;
		}

		bool operator!=(const Self& s)
		{
			return _node != s._node;
		}

		Self& operator++()
		{
			if (_node->_next != nullptr)
			{
				_node = _node->_next;
			}
			else
			{
				//找到下一个不为空的桶
				KeyOfT kot;
				Hash hash;
				//算出当前桶的位置
				size_t hashi = hash(kot(_node->_data)) % _ht->_tables.size();
				++hashi;
				while (hashi < _ht->_tables.size())
				{
					if (_ht->_tables[hashi])
					{
						_node = _ht->_tables[hashi];
						break;
					}
					else
					{
						++hashi;
					}
				}

				//没有找到不为空的桶
				if (hashi == _ht->_tables.size())
				{
					_node = nullptr;
				}
			}

			return *this;
		}
	};

2.4begin()和end()迭代器

		iterator begin()
		{
			Node* cur = nullptr;
			for (size_t i = 0; i < _tables.size(); ++i)
			{
				cur = _tables[i];
				if (cur)
				{
					break;
				}
			}
			return iterator(cur, this);
		}

		iterator end()
		{
			return iterator(nullptr, this);
		}

		const_iterator begin()const
		{
			Node* cur = nullptr;
			for (size_t i = 0; i < _tables.size(); ++i)
			{
				cur = _tables[i];
				if (cur)
				{
					break;
				}
			}
			return const_iterator(cur, this);
		}

		const_iterator end()const
		{
			return const_iterator(nullptr, this);
		}

2.5析构函数

		~HashTable()
		{
			for (auto& cur : _tables)
			{
				while (cur)
				{
					Node* next = cur->_next;
					delete cur;
					cur = next;
				}
				cur = nullptr;
			}
		}

2.6查找函数

		iterator Find(const K& key)
		{
			if (_tables.size() == 0)
				return end();

			KeyOfT kot;
			Hash hash;
			size_t hashi = hash(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					return iterator(cur, this);
				}
				cur = cur->_next;
			}
			return end();
		}

2.7删除函数

		bool Erase(const K& key)
		{
			KeyOfT kot;
			Hash hash;
			size_t hashi = hash(key) % _tables.size();
			Node* prev = nullptr;
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					if (prev == nullptr)
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						prev->_next = cur->_next;
					}
					delete cur;

					return true;
				}
				else
				{
					prev = cur;
					cur = cur->_next;
				}
			}
			return false;
		}

2.8插入函数

		pair<iterator, bool> Insert(const T& data)
		{
			KeyOfT kot;
			iterator it = Find(kot(data));
			if (it != end())
			{
				return make_pair(it, false);
			}

			Hash hash;

			//负载因子等于1时扩容
			if (_n == _tables.size())
			{
				size_t newsize = GetNextPrime(_tables.size());
				vector<Node*> newtables(newsize, nullptr);
				for (auto& cur : _tables)
				{
					while (cur)
					{
						Node* next = cur->_next;

						size_t hashi = hash(kot(cur->_data)) % newtables.size();

						//头插到新表
						cur->_next = newtables[hashi];
						newtables[hashi] = cur;
						cur = next;
					}
				}
				_tables.swap(newtables);
			}

			size_t hashi = hash(kot(data)) % _tables.size();

			//头插
			Node* newnode = new Node(data);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;

			++_n;
			return make_pair(iterator(newnode, this), false);
		}

2.9获取最大的哈希桶

		size_t MaxBucketSize()
		{
			size_t max = 0;
			for (size_t i = 0; i < _tables.size(); ++i)
			{
				auto cur = _tables[i];
				size_t size = 0;
				while (cur)
				{
					++size;
					cur = cur->_next;
				}

				if (size > max)
					max = size;
				return max;
			}
		}

四、利用哈希表模拟实现unordered_map和unordered_set的封装

1、unordered_set

	template<class K, class Hash = HashFunc<K>>
	class unordered_set
	{
	public:
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
	public:
		typedef typename HashBucket::HashTable<K, K, SetKeyOfT, Hash>::const_iterator iterator;
		typedef typename HashBucket::HashTable<K, K, SetKeyOfT, Hash>::const_iterator const_iterator;

		iterator begin()
		{
			return _ht.begin();
		}

		iterator end()
		{
			return _ht.end();
		}

		const_iterator begin()const
		{
			return _ht.begin();
		}

		const_iterator end()const
		{
			return _ht.end();
		}

		pair<iterator, bool> insert(const K& key)
		{
			return _ht.Insert(key);
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}
	private:
		HashBucket::HashTable<K, K, SetKeyOfT, Hash> _ht;
	};

2、unordered_map

	template<class K, class V, class Hash = HashFunc<K>>
	class unordered_map
	{
	public:
		struct MapKeyOfT
		{
			const K& operator()(const pair<K, V>& kv)
			{
				return kv.first;
			}
		};
	public:
		typedef typename HashBucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::iterator iterator;
		typedef typename HashBucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::const_iterator const_iterator;

		iterator begin()
		{
			return _ht.begin();
		}

		iterator end()
		{
			return _ht.end();
		}

		const_iterator begin() const
		{
			return _ht.begin();
		}

		const_iterator end() const
		{
			return _ht.end();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		V& operator[](const K& key)
		{
			pair<iterator, bool> ret = insert(make_pair(key, V()));
			return ret.first->second;
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

	private:
		HashBucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;
	};

coco冯

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
C++——哈希

unordered_map是存储<keym value>键值对的关联式容器，其允许通过keys快速的索引到与其对应的value。·在unordered_map中，键值通常用于唯一的表示元素而映射值是一个对象，其内容与此键关联。键和映射值的类型可能不同。·在内部，unordered_map没有对<key, value>按照任何特定的顺序排序，为了能在常数范围内找到key对应的value，unordered_map将相同哈希值的键值对放在相同的桶中。
复制链接

扫一扫