【C++】unordered_set与unordered_map的封装

最新推荐文章于 2024-05-20 15:29:54 发布

平凡的小苏

最新推荐文章于 2024-05-20 15:29:54 发布

阅读量373

点赞数 42

分类专栏： C++修炼内功文章标签： c++ 开发语言

本文链接：https://blog.csdn.net/VHhhbb/article/details/133438370

版权

C++修炼内功专栏收录该内容

24 篇文章 5 订阅

订阅专栏

🌇个人主页：平凡的小苏
📚学习格言：命运给你一个低的起点，是想看你精彩的翻盘，而不是让你自甘堕落，脚下的路虽然难走，但我还能走，比起向阳而生，我更想尝试逆风翻盘。
🛸C++专栏：C++内功修炼基地
> 家人们更新不易，你们的👍点赞👍和⭐关注⭐真的对我真重要，各位路过的友友麻烦多多点赞关注。欢迎你们的私信提问，感谢你们的转发！关注我，关注我，关注我，你们将会看到更多的优质内容！！

在这里插入图片描述

一、unordered序列关联式容器

在C++98中，STL提供了底层为红黑树结构的一系列关联式容器，在查询时效率可达到logN，最差情况下也仅需要比较红黑树的高度次，当树中的节点非常多时，查询效率也不理想。因此在C++11中，STL又提供了4个unordered系列的关联式容器，这四个容器与红黑树结构的关联式容器使用方式基本类似，只是其底层结构不同，unordered系列容器并不是有序的。

二、unordered_map与unordered_set结构

1、unordered_map

在这里插入图片描述

2、unordered_set

在这里插入图片描述

unordered_set并不像set那样支持反向迭代器，它的迭代器底层是一个单向迭代器。 其他功能与set类似unordered_set的插入是无序的，不一定按照插入的顺序进行打印。自带去重功能。

三、红黑树与哈希的性能比较

红黑树的插入删除查找性能都是O(logN),而哈希表的插入删除查找性能理论上都是O(1)，他是相对于稳定的，最差情况下都是高效的。哈希表的插入删除操作的理论上时间复杂度是常数时间的，这有个前提就是哈希表不发生数据碰撞。在发生碰撞的最坏的情况下，哈希表的插入和删除时间复杂度为O(n)。

四、哈希的底层结构

顺序结构以及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须要经过关键码的多次比较。顺序查找时间复杂度为O(N)，平衡树中为树的高度，即O(log2N)，搜索的效率取决于搜索过程中元素的比较次数。
理想的搜索方法：可以不经过任何比较，一次直接从表中得到要搜索的元素。
如果构造一种存储结构，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

4.1常见哈希函数

直接定址法–(常用)
取关键字的某个线性函数为散列地址：Hash（Key） = A * Key + B
优点：简单、均匀
缺点：需要事先知道关键字的分布情况
使用场景：适合查找比较小且连续的情况
除留余数法–(常用)
设散列表中允许的地址数为m，取一个不大于m，但最接近或者等于m的质数p作为除数，
按照哈希函数：Hash(key) = key % p(p <= m), 将关键码转换成哈希地址

4.2哈希扩容机制

在这里插入图片描述

4.3闭散列

闭散列：也叫开放定址法，当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置中的“下一个” 空位置中去。

线性探测：从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止。

插入：

通过哈希函数获取待插入元素在哈希表中的位置

如果该位置中没有元素则直接插入新元素，如果该位置中有元素发生哈希冲突，使用线性探测找到下一个空位置，插入新元素

在这里插入图片描述

删除

采用闭散列处理哈希冲突时，不能随便物理删除哈希表中已有的元素，若直接删除元素会影响其他元素的搜索。比如删除元素4，如果直接删除掉，44查找起来可能会受影响。因此线性探测采用标记的伪删除法来删除一个元素。

代码实例：

template<class K>
struct DefaultHashFunc//默认仿函数
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

template<>
struct DefaultHashFunc<std::string>//string的仿函数特化
{
	size_t operator()(const std::string& str)
	{
		// BKDR
		size_t hash = 0;
		for (auto ch : str)//将哈希值转化为整型数值
		{
			hash *= 131;
			hash += ch;
		}

		return hash;
	}
};
namespace sqy
{
	enum STATE
	{
		EXIST,
		EMPTY,
		DELETE
	};

	template<class K, class V>
	struct HashTableData
	{
		std::pair<K, V> _kv;
		STATE _state = EMPTY;
	};

	template<class K, class V, class HashFunc = DefaultHashFunc<K>>
	class HashTable
	{
	public:
		HashTable()
		{
			_tables.resize(10);//为了有效个数为零时计算负载因子不出错
		}
		bool Insert(const std::pair<K, V>& kv)
		{
			if (Find(kv.first))
			{
				return false;//存在就不插入了
			}

			if (_n * 10 / _tables.size() >= 7)//扩容
			{
				size_t newcapa = _tables.size() * 2;
				HashTable<K, V>newtable;//复用插入接口，否则再写一次插入，代码冗余
				newtable._tables.resize(newcapa);//开好空间
				for (size_t i = 0; i < _tables.size(); i++)//遍历插入
				{
					if (_tables[i]._state == EXIST)
					{
						newtable.Insert(_tables[i]._kv);
					}
				}
				_tables.swap(newtable._tables);//交换
			}
			HashFunc hf;
			size_t hashi = hf(kv.first) % _tables.size();//不能使用capacity，有越界检测；
			while (_tables[hashi]._state == EXIST)//由于除留余数的位置可能已经映射，需要往后遍历找空位置
			{
				++hashi;
				hashi %= _tables.size();
			}

			_tables[hashi]._kv = kv;
			_tables[hashi]._state = EXIST;//设置状态
			_n++;
			return true;
		}

		HashTableData<const K, V>* Find(const K& key)
		{
			HashFunc hf;
			size_t hashi = hf(key) % _tables.size();
			while (_tables[hashi]._state != EMPTY)
			{
				if (_tables[hashi]._state != DELETE && _tables[hashi]._state == EXIST)
				{
					return (HashTableData<const K, V>*) & _tables[hashi];//注意这里强转是因为返回时没有const，做强转是因为有些编译器不支持隐式类型转换
				}
				hashi++;
			}

			return nullptr;
		}

		bool Erase(const K& key)
		{
			HashTableData<const K, V>* ret = Find(key);
			if (ret)
			{
				ret->_state = DELETE;
				--_n;
				return true;
			}
			return false;
		}
	private:
		std::vector<HashTableData<K, V>> _tables;
		size_t _n;//存储有效个数
	};
}

4.4开散列(拉链法/哈希桶)

1.开散列的概念

开散列法又叫链地址法(开链法)，首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

在这里插入图片描述

从上图可以看出，开散列中每个桶中放的都是发生哈希冲突的元素。

官方文档中规定开散列的负载因子是1，负载因子是1的意思是当哈希表的size等于节点数量时，就需要扩容了。

template<class K>
struct DefaultHashFunc//默认仿函数
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

template<>
struct DefaultHashFunc<std::string>//string的仿函数特化
{
	size_t operator()(const std::string& str)
	{
		// BKDR
		size_t hash = 0;
		for (auto ch : str)//将哈希值转化为整型数值
		{
			hash *= 131;
			hash += ch;
		}

		return hash;
	}
};
namespace hash_bucket
{
	template<class K, class V>
	struct HashNode
	{
		std::pair<K, V> _kv;
		HashNode<K, V>* _next;
		HashNode(const std::pair<K,V>& kv)
			:_kv(kv)
			,_next(nullptr)
		{}
	};

	template<class K, class V, class HashFunc = DefaultHashFunc<K>>
	class HashTable
	{
		typedef HashNode<K, V> Node;
	public:
		HashTable()
		{
			_tables.resize(10);//为了有效个数为零时计算负载因子不出错
		}
		bool Insert(const std::pair<K, V>& kv)
		{
			HashFunc hf;
			if (_n == _tables.size())
			{
				size_t newSize = _n * 2;
				std::vector<Node*>newTable;//这里的插入不能在继续复用插入接口了，，每个节点重新创建，又释放旧节点，效率更低了
				newTable.resize(newSize, nullptr);

				//遍历旧表，把节点牵下来挂到新表
				for (size_t i = 0; i < _tables.size(); i++)
				{
					if (_tables[i])//节点不为空
					{
						Node* cur = _tables[i];
						while (cur)
						{
							Node* next = cur->_next;
							size_t hashi = hf(cur->_kv.first) % newSize;//扩容后需要重新映射
							cur->_next = newTable[hashi];
							newTable[hashi] = cur;
							cur = next;
						}
						_tables[i] = nullptr;
					}
				}

				_tables.swap(newTable);
			}
			size_t hashi = hf(kv.first) % _tables.size();

			Node* newnode = new Node(kv);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;
			++_n;
			return true;
		}

		Node* Find(const K& key)
		{
			HashFunc hf;
			size_t hashi = hf(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					return cur;
				}
				cur = cur->_next;
			}
			return nullptr;
		}

		bool Erase(const K& key)
		{
			HashFunc hf;
			size_t hashi = hf(key) % _tables.size();
			Node* cur = _tables[hashi];
			Node* pre = nullptr;
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					if (pre == nullptr)
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						pre->_next = cur->_next;
					}
					delete cur;
					return true;
				}
				pre = cur;
				cur = cur->_next;
			}

			return false;
		}
		void Print()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				printf("[%d]->", i);
				Node* cur = _tables[i];
				while (cur)
				{
					std::cout << cur->_kv.first << ":" << cur->_kv.second << "->";
					cur = cur->_next;
				}
				printf("NULL\n");
			}
			std::cout << std::endl;
		}
	private:
		std::vector<Node*> _tables;
		size_t _n;//存储有效个数
	};
}

两种扩容写法，闭散列写法是把原表的节点拷贝一份插入到新的哈希表，全部元素映射完毕后再交换哈希表。开散列写法是通过修改哈希桶中单链表的指针指向，直接摘取原表元素，同样映射完毕后交换哈希表，没有拷贝构造的过程。不过用的时候记得将原表中的指向置空，否则藕断丝连，原表被交换后会发生析构，因为指向关系没有断，造成刚插入的数据全部被清空。

为啥闭散列不能必须老老实实的拷贝？因为闭散列中的哈希表存的是值，开散列中存的是节点，节点可以摘下来。

五、完整源码(使用哈希桶封装)

unordered_set

namespace sqy
{
	template<class K>
	class unordered_set
	{
	public:
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
		typedef typename hash_bucket::HashTable<K, K, SetKeyOfT>::const_iterator iterator;
		typedef typename hash_bucket::HashTable<K, K, SetKeyOfT>::const_iterator const_iterator;


		const_iterator begin() const 
		{
			return _ht.begin();
		}

		const_iterator end() const 
		{
			return _ht.end();
		}

		std::pair<iterator,bool> insert(const K& key)
		{
			std::pair<typename hash_bucket::HashTable<K, K, SetKeyOfT>::iterator, bool> ret = _ht.Insert(key);
			return std::pair<iterator, bool>(ret.first, ret.second);
		}
	private:
		hash_bucket::HashTable<K, K, SetKeyOfT> _ht;
	};
}

unordered_map

namespace sqy
{
	template<class K,class V>
	class unordered_map
	{
	public:

		struct MapKeyOfT
		{
			const K& operator()(const std::pair<K, V>& kv)
			{
				return kv.first;
			}
		};

		typedef typename hash_bucket::HashTable<K, std::pair<const K, V>, MapKeyOfT>::iterator iterator;
		typedef typename hash_bucket::HashTable<K, std::pair<const K, V>, MapKeyOfT>::const_iterator const_iterator;

		iterator begin()
		{
			return _ht.begin();
		}

		iterator end()
		{
			return _ht.end();
		}

		const_iterator begin() const
		{
			return _ht.begin();
		}

		const_iterator end() const
		{
			return _ht.end();
		}

		V& operator[](const K& key)
		{
			pair<iterator, bool> ret = _ht.Insert(std::make_pair(key,V()));
			return ret.first->second;
		}

		std::pair<iterator,bool> insert(const std::pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}
	private:
		hash_bucket::HashTable<K, std::pair<const K,V>,MapKeyOfT> _ht;
	};
}

hash.h

#include <iostream>
#include <vector>
using namespace std;

template<class K>
struct DefaultHashFunc//默认仿函数
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

template<>
struct DefaultHashFunc<std::string>//string的仿函数特化
{
	size_t operator()(const std::string& str)
	{
		// BKDR
		size_t hash = 0;
		for (auto ch : str)//将哈希值转化为整型数值
		{
			hash *= 131;
			hash += ch;
		}

		return hash;
	}
};
namespace sqy
{
	enum STATE
	{
		EXIST,
		EMPTY,
		DELETE
	};

	template<class K, class V>
	struct HashTableData
	{
		std::pair<K, V> _kv;
		STATE _state = EMPTY;
	};

	template<class K, class V, class HashFunc = DefaultHashFunc<K>>
	class HashTable
	{
	public:
		HashTable()
		{
			_tables.resize(10);//为了有效个数为零时计算负载因子不出错
		}
		bool Insert(const std::pair<K, V>& kv)
		{
			if (Find(kv.first))
			{
				return false;//存在就不插入了
			}

			if (_n * 10 / _tables.size() >= 7)//扩容
			{
				size_t newcapa = _tables.size() * 2;
				HashTable<K, V>newtable;//复用插入接口，否则再写一次插入，代码冗余
				newtable._tables.resize(newcapa);//开好空间
				for (size_t i = 0; i < _tables.size(); i++)//遍历插入
				{
					if (_tables[i]._state == EXIST)
					{
						newtable.Insert(_tables[i]._kv);
					}
				}
				_tables.swap(newtable._tables);//交换
			}
			HashFunc hf;
			size_t hashi = hf(kv.first) % _tables.size();//不能使用capacity，有越界检测；
			while (_tables[hashi]._state == EXIST)//由于除留余数的位置可能已经映射，需要往后遍历找空位置
			{
				++hashi;
				hashi %= _tables.size();
			}

			_tables[hashi]._kv = kv;
			_tables[hashi]._state = EXIST;//设置状态
			_n++;
			return true;
		}

		HashTableData<const K, V>* Find(const K& key)
		{
			HashFunc hf;
			size_t hashi = hf(key) % _tables.size();
			while (_tables[hashi]._state != EMPTY)
			{
				if (_tables[hashi]._state != DELETE && _tables[hashi]._state == EXIST)
				{
					return (HashTableData<const K, V>*) & _tables[hashi];//注意这里强转是因为返回时没有const，做强转是因为有些编译器不支持隐式类型转换
				}
				hashi++;
			}

			return nullptr;
		}

		bool Erase(const K& key)
		{
			HashTableData<const K, V>* ret = Find(key);
			if (ret)
			{
				ret->_state = DELETE;
				--_n;
				return true;
			}
			return false;
		}
	private:
		std::vector<HashTableData<K, V>> _tables;
		size_t _n;//存储有效个数
	};
}

namespace hash_bucket
{
	template<class K, class T, class KeyOfT, class HashFunc>
	class HashTable;
	
	template<class T>
	struct HashNode
	{
		T _data;
		HashNode<T>* _next;
		HashNode(const T& data)
			:_data(data)
			,_next(nullptr)
		{}
	};

	template<class K, class T, class Ref, class Ptr,class KeyOfT, class HashFunc>
	struct HTIterator
	{
		typedef HashNode<T> Node;
		typedef HTIterator<K, T, Ref,Ptr, KeyOfT, HashFunc> Self;
		typedef HTIterator<K, T, T&, T*, KeyOfT, HashFunc> Iterator;

		Node* _node;
		const HashTable<K, T, KeyOfT, HashFunc>* _pht;

		HTIterator(Node* node,const HashTable<K, T, KeyOfT, HashFunc>* pht)//这里使用const是为了解决调用const的end时，this时const的，不改为const会权限放大
			:_node(node)
			,_pht(pht)
		{}

		HTIterator(const Iterator& it)
			:_node(it._node)
			,_pht(it._pht)
		{}

		Ptr operator->()
		{
			return &_node->_data;
		}

		Ref operator*()
		{
			return _node->_data;
		}

		Self& operator++()
		{
			if (_node && _node->_next)
			{
				_node = _node->_next;
				return *this;
			}
			else
			{
				HashFunc ht;
				KeyOfT kot;
				size_t hashi = ht(kot(_node->_data)) % _pht->_tables.size();
				//从下一个位置开始遍历
				++hashi;
				while (hashi < _pht->_tables.size())
				{
					if (_pht->_tables[hashi])
					{
						_node = _pht->_tables[hashi];
						return *this;
					}
					else
					{
						hashi++;
					}
				}
				_node = nullptr;
			}
			return *this;
		}
	
		bool operator!=(const Self& s)
		{
			return _node != s._node;

		}

		bool operator==(const Self& s)
		{
			return _node == s._node;

		}
	};

	template<class K, class T, class KeyOfT, class HashFunc = DefaultHashFunc<K>>
	class HashTable
	{
		typedef HashNode<T> Node;
		template<class K, class T, class Ref, class Ptr, class KeyOfT, class HashFunc>
		friend struct HTIterator;
	public:
		typedef HTIterator<K, T, T&, T*, KeyOfT, HashFunc> iterator;
		typedef HTIterator<K, T, const T&, const T*, KeyOfT, HashFunc> const_iterator;


		HashTable()
		{
			_tables.resize(10);//为了有效个数为零时计算负载因子不出错
		}
		iterator begin()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				if (cur)
				{
					return iterator(cur, this);
				}
			}
			return iterator(nullptr, this);
		}

		iterator end()
		{
			return iterator(nullptr, this);
		}

		const_iterator begin() const
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				if (cur)
				{
					return const_iterator(cur, this);
				}
			}
			return iterator(nullptr, this);
		}

		const_iterator end() const
		{
   			return const_iterator(nullptr, this);
		}

		std::pair<iterator,bool> Insert(const T& data)
		{
			HashFunc hf;
			KeyOfT kot;
			iterator it = Find(kot(data));
			if (it != end())
			{
				return std::make_pair(it, false);
			}
			if (_n == _tables.size())
			{
				size_t newSize = _n * 2;
				std::vector<Node*>newTable;//这里的插入不能在继续复用插入接口了，，每个节点重新创建，又释放旧节点，效率更低了
				newTable.resize(newSize, nullptr);

				//遍历旧表，把节点牵下来挂到新表
				for (size_t i = 0; i < _tables.size(); i++)
				{
					if (_tables[i])//节点不为空
					{
						Node* cur = _tables[i];
						while (cur)
						{
							Node* next = cur->_next;
							size_t hashi = hf(kot(cur->_data)) % newSize;//扩容后需要重新映射
							cur->_next = newTable[hashi];
							newTable[hashi] = cur;
							cur = next;
						}
						_tables[i] = nullptr;
					}
				}

				_tables.swap(newTable);
			}
			size_t hashi = hf(kot(data)) % _tables.size();

			Node* newnode = new Node(data);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;
			++_n;
			return std::make_pair(iterator(newnode, this), true);
		}

		iterator Find(const K& key)
		{
			HashFunc hf;
			KeyOfT kot;
			size_t hashi = hf(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					return iterator(cur,this);
				}
				cur = cur->_next;
			}
			return end();
		}

		bool Erase(const K& key)
		{
			HashFunc hf;
			KeyOfT kot;
			size_t hashi = hf(key) % _tables.size();
			Node* cur = _tables[hashi];
			Node* pre = nullptr;
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					if (pre == nullptr)
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						pre->_next = cur->_next;
					}
					delete cur;
					--_n;
					return true;
				}
				pre = cur;
				cur = cur->_next;
			}

			return false;
		}
		void Print()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				printf("[%d]->", i);
				Node* cur = _tables[i];
				while (cur)
				{
					std::cout << cur->_kv.first << ":" << cur->_kv.second << "->";
					cur = cur->_next;
				}
				printf("NULL\n");
			}
			std::cout << std::endl;
		}
	private:
		std::vector<Node*> _tables;
		size_t _n;//存储有效个数
	};
}