哈希表及unordered_map、unordered_set的模拟实现

最新推荐文章于 2024-07-29 11:51:15 发布

写bug还得是我

最新推荐文章于 2024-07-29 11:51:15 发布

阅读量213

点赞数

文章标签：散列表数据结构

本文链接：https://blog.csdn.net/z85123789/article/details/131616143

版权

一、哈希表的概念

顺序结构及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须经过关键码的多次比较，顺序查找时间复杂度为O(N)，平衡树中为树的高度，即O(log_2 N)，搜索的效率取决于搜索过程中元素的比较次数。

如果有一种数据结构，存储的数据与存储的位置呈映射关系，那么不需要进行比较直接通过对应的位置找到映射的数据。

其实这种数据结构就叫哈希表或者散列表，插入的数据通过哈希函数得到对应的位置，然后数据就插入在表中对应的位置上，

例如，我们在表中依次插入1、4、5、6、7、9，这里我们用的哈希函数是hash(key) = key % capacity，假设这个表的容量是10，并且插入的数据都小于容量，那么插入位置的下标和数据是相等的：

那如果插入一个比容量大的数据呢？比如35呢？

35%10=5，可是下标为5的位置已经有数据了，那怎么办？

这个时候就会发生一个事情叫哈希冲突或哈希碰撞（即不同的数据通过相同的函数得到相同的地址）。

二、哈希函数

引起哈希冲突的其中一个原因可能是哈希函数设计的不够合理，我们先了解哈希函数的设计原则：

1、哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址，其值域必须在0~(m-1)之间。

2、哈希函数计算出来的地址能均匀分布在整个空间中。

3、哈希函数应该比较简单。

常用的哈希函数：

1、直接定址法，哈希函数公式就是：hash(key) = A * key + B。

这个方法比较简单，但是它有局限性，它不能应用在海量数据中，太费空间了，适合用在数据比较小并且比较密集的情况。

2、除留余数法，设哈希表中允许的地址数为m，取一个接近或者等于m的质数p作除数，按照哈希函数hash(key) = key % p将关键码转换为哈希地址。

3、平方取中法，假设关键字为1234，它的平方就是1522756，取中间的三位数227作为哈希地址，平方取中法比较适合不知道关键字的分布，而位数又不是很大的情况。

还有一些其他的就不一一举例了...

三、解决哈希冲突

一、闭散列

当发生哈希冲突时，如果哈希表里还有空位置，那么就从映射的这个位置开始找下一个空位置，找到空位置就填上。

找下一个空位置的方法可以使用线性探测，拿上面那组数据为例，如果再次插入35，通过哈希函数得到哈希地址，为5，但是5这个位置已经有人了，那么就继续从5的位置后面继续走，直到找到空位置再填上，如果走到表的末尾还是没有空位置那就再从表头开始找（前提是确定有空位置），但是有一个情况要考虑，那就是删除的时候，首先我们先把35插入进去，经过线性探测最后是在哈希地址为8的位置停下并插入

之后我们再删除5，如果我们还想继续删除35，发现35通过哈希函数得到的哈希地址对应的位置没有数据，会误以为表中没有35。

解决办法：把表中每个数据都加一个状态信息，分为三种状态，一个是默认的空状态（EMPTY），一个是存在状态（EXIST），一个为删除状态（DELETE）。如果我们删除数据的话只要把状态改为删除即可。

那要扩容的情况呢？

哈希表的载荷因子定义为 α = 填入表中的元素个数 / 哈希表长度，元素个数越多，载荷因子越大，发生哈希冲突的可能性也越大，反之越小，我们最好把荷载因子定义在0.7-0.8之间。

闭散列的实现：

enum State { EMPTY, EXIST, DELETE };

	/*template<class T>
	class HashFunction
	{
	public:
		size_t operator()(const T& val)
		{
			return val.first;
		}
	};*/

	template<class K, class V>
	class HashTable
	{
		struct Elem
		{
			pair<K, V> _val;
			State _state;
		};

	public:
		HashTable(size_t capacity = 3)
			: _ht(capacity), _size(0), _totalSize(0)
		{
			for (size_t i = 0; i < capacity; ++i)
				_ht[i]._state = EMPTY;
		}

		// 插入
		bool Insert(const pair<K, V>& val)
		{
			if (Find(val.first) != -1)
				return false;
			CheckCapacity();

			size_t hashi = HashFunc(val.first);
			size_t index = hashi;
			size_t i = 1;
			
			while (_ht[index]._state == EXIST)
			{
				index = hashi + i;
				index %= _ht.size();
				++i;
			}

			_ht[index]._val = val;
			_ht[index]._state = EXIST;
			++_size;
			++_totalSize;

			return true;
		}

		// 查找
		size_t Find(const K& key)
		{
			size_t hashi = HashFunc(key);
			size_t i = 1;
			size_t index = hashi;
			while (_ht[index]._state != EMPTY)
			{
				if (_ht[index]._state == EXIST && _ht[index]._val == make_pair(key,key))
					return index;

				index = hashi + i;
				index %= _ht.size();
				++i;

				if (index == hashi)
					break;
			}
			return -1;
		}

		// 删除
		bool Erase(const K& key)
		{
			//size_t hashi = HashFunc(key) % _ht.size();
			if (Find(key) != -1)
			{
				_ht[Find(key)]._state = DELETE;
				--_size;

				return true;
			}
			return false;
		}

		size_t Size()const
		{
			return _size;
		}

		bool Empty() const
		{
			return _size == 0;
		}

		void Swap(HashTable<K, V>& ht)
		{
			swap(_size, ht._size);
			swap(_totalSize, ht._totalSize);
			_ht.swap(ht._ht);
		}

	private:
		size_t HashFunc(const K& key)
		{
			return key % _ht.capacity();
		}

		void CheckCapacity()
		{
			if (_ht.size() == 0 || _size * 10 / _ht.size() >= 7)
			{
				size_t newsize = _ht.size() == 0 ? 10 : _ht.size() * 2;
				HashTable<K, V> newtable;
				newtable._ht.resize(newsize);

				for(auto& e : _ht)
				{
					if (e._state == EXIST)
						newtable.Insert(e._val);
				}
				this->Swap(newtable);
			}
		}
	private:
		vector<Elem> _ht;
		size_t _size;
		size_t _totalSize;  // 哈希表中的所有元素：有效和已删除, 扩容时候要用到
	};

二、开散列

开散列法又叫链地址法，首先对关键码集合用哈希函数计算哈希地址，具有相同地址的关键码归于同一子集合，每一个子集和都称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头节点存在哈希表中。

开散列的实现：

template<class T>
	class HashFunc
	{
	public:
		size_t operator()(const T& val)
		{
			return val;
		}
	};

	template<>
	class HashFunc<string>
	{
	public:
		size_t operator()(const string& s)
		{
			const char* str = s.c_str();
			unsigned int seed = 131; // 31 131 1313 13131 131313
			unsigned int hash = 0;
			while (*str)
			{
				hash = hash * seed + (*str++);
			}

			return hash;
		}
	};

	template<class V>
	struct HashBucketNode
	{
		HashBucketNode(const V& data)
			: _pNext(nullptr), _data(data)
		{}
		HashBucketNode<V>* _pNext;
		V _data;
	};

	template<class V, class HF>
	class HashBucket;

	template<class T,class Ref,class Ptr,class HF = HashFunc<T>>
	struct HashIterator
	{
		typedef HashBucketNode<T> Node;
		typedef HashBucket<T,HF> HT;
		typedef HashIterator<T,Ref,Ptr,HF> Self;
		typedef HashIterator<T,T&,T*> Iterator;

		HashIterator(const Iterator& it)
			:_ht(it._ht)
			,_node(it._node)
		{}

		HashIterator(Node* node,HT* ht)
			:_ht(ht)
			, _node(node)
		{}

		Self& operator++()
		{
			if (_node->_pNext)
				_node = _node->_pNext;
			else
			{
				size_t hashi = HF()(_node->_data) % _ht->_table.size();
				++hashi;
				while (hashi < _ht->_table.size())
				{
					if (_ht->_table[hashi])
					{
						_node = _ht->_table[hashi];
						break;
					}
					++hashi;
				}
				if (hashi == _ht->_table.size())
					_node = nullptr;
			}
			return *this;
		}

		Ref operator*()
		{
			return _node->_data;
		}

		bool operator!=(const Self& it)
		{
			return this->_node != it._node;
		}
	private:
		Node* _node;
		const HT* _ht;
	};

	// 本文所实现的哈希桶中key是唯一的
	template<class V, class HF = HashFunc<V>>
	class HashBucket
	{
		template<class T, class Ref, class Ptr, class HF>
		friend struct HashIterator;
	public:
		typedef HashBucketNode<V> Node;
		typedef Node* PNode;
		typedef HashBucket<V, HF> Self;
		typedef HashIterator<V, V&, V*, HF> iterator;

	public:
		HashBucket(size_t capacity = 0)
			: _table(GetNextPrime(capacity))
			, _size(0)
		{}

		~HashBucket()
		{
			Clear();
		}

		iterator begin()
		{
			size_t hashi = 0;
			for (; hashi < _table.size(); ++hashi)
			{
				if (_table[hashi])
					return iterator(_table[hashi],this);
			}
			return iterator(_table[hashi],this);
		}

		iterator end()
		{
			return iterator(nullptr,this);
		}

		size_t GetNextPrime(size_t prime)
		{
			// SGI
			static const int __stl_num_primes = 28;
			static const unsigned long __stl_prime_list[__stl_num_primes] =
			{
				53, 97, 193, 389, 769,
				1543, 3079, 6151, 12289, 24593,
				49157, 98317, 196613, 393241, 786433,
				1572869, 3145739, 6291469, 12582917, 25165843,
				50331653, 100663319, 201326611, 402653189, 805306457,
				1610612741, 3221225473, 4294967291
			};

			size_t i = 0;
			for (; i < __stl_num_primes; ++i)
			{
				if (__stl_prime_list[i] > prime)
					return __stl_prime_list[i];
			}

			return __stl_prime_list[i];
		}

		// 哈希桶中的元素不能重复
		Node* Insert(const V& data)
		{
			PNode node = Find(data);
			if (node)
				return node;
			else
			{
				//考虑扩容问题
				CheckCapacity();
				size_t hashi = HF()(data) % _table.size();
				Node* newnode = new Node(data);
				newnode->_pNext = _table[hashi];
				_table[hashi] = newnode;
				++_size;

				return newnode;
			}
				
		}

		// 删除哈希桶中为data的元素(data不会重复)
		bool Erase(const V& data)
		{
			size_t hashi = HF()(data) % _table.size();
			PNode node = _table[hashi];
			PNode prev = nullptr;
			while (node)
			{
				if (node->_data == data)
				{
					if (node == _table[hashi])
					{
						PNode next = node->_pNext;
						delete node;
						_table[hashi] = next;
						return true;
					}
					else
					{
						prev->_pNext = node->_pNext;
						delete node;
						return true;
					}
				}
				else
				{
					prev = node;
					node = node->_pNext;
				}
			}
			return false;
		}

		Node* Find(const V& data)
		{
			if (_table.size() == 0)
				return nullptr;
			size_t hashi = HF()(data) % _table.size();
			PNode node = _table[hashi];
			while (node != nullptr && node->_data != data)
			{
				node = node->_pNext;
			}
			return node;
		}

		size_t Size()const
		{
			return _size;
		}

		bool Empty()const
		{
			return 0 == _size;
		}

		void Clear()
		{
			for (auto& node : _table)
			{
				PNode cur = node;
				while (cur)
				{
					PNode next = cur->_pNext;
					delete cur;
					cur = next;
				}
			}
		}

		size_t BucketCount()const
		{
			return _table.capacity();
		}

		void Swap(Self& ht)
		{
			_table.swap(ht._table);
			swap(_size, ht._size);
		}

	private:
		size_t HashFunc(const V& data)
		{
			return HF()(data) % _table.capacity();
		}

		void CheckCapacity()
		{
			if (_size == _table.size())
			{
				/*vector<Node*> newtable = new vector<Node*>(_size);*/
				Self* newhashbucket = new Self(_size);
				for (auto& p : _table)
				{
					PNode cur = p;
					while (cur)
					{						
						PNode next = cur->_pNext;

						size_t hashi = HF()(cur->_data) % newhashbucket->_table.size();
						cur->_pNext = newhashbucket->_table[hashi];
						newhashbucket->_table[hashi] = cur;

						cur = next;
					}
				}
			}
		}

	private:
		vector <Node*> _table;
		size_t _size;      // 哈希表中有效元素的个数
	};

在代码的最上方，我们定义了一个仿函数，其目的是存储数据的key不一定是整形，也可能是其他类型，就不方便转换成整型，所以可以通过仿函数转成整形。

四、unordered_set和unordered_map

其实它们的底层就是哈希表，所以模拟实现的话可以对哈希表进行封装就可以了。

一、unordered_set的模拟实现

// 为了实现简单，在哈希桶的迭代器类中需要用到hashBucket本身，
	template<class K, class V, class KeyOfValue, class HF>
	class HashBucket;

	// 注意：因为哈希桶在底层是单链表结构，所以哈希桶的迭代器不需要--操作
	template <class K, class V,class Ref,class Ptr, class KeyOfValue, class HF>
	struct HBIterator
	{
		typedef HashBucket<K, V, KeyOfValue, HF> HashBucket;
		typedef HashBucketNode<V>* PNode;
		typedef HBIterator<K, V,Ref,Ptr, KeyOfValue, HF> Self;

		HBIterator(PNode pNode = nullptr, HashBucket* pHt = nullptr)
			:_pNode(pNode)
			,_pHt(pHt)
		{}

		Self& operator++()
		{
			// 当前迭代器所指节点后还有节点时直接取其下一个节点
			if (_pNode->_pNext)
				_pNode = _pNode->_pNext;
			else
			{
				// 找下一个不空的桶，返回该桶中第一个节点
				size_t bucketNo = _pHt->HashFunc(KeyOfValue()(_pNode->_data)) + 1;
				for (; bucketNo < _pHt->BucketCount(); ++bucketNo)
				{
					if (_pNode = _pHt->_ht[bucketNo])
						break;
				}
				/*if (bucketNo == _pHt->BucketCount())
					_pNode = nullptr;*/
			}

			return *this;
		}
		Self operator++(int)
		{
			Self tmp = *this;
			this->operator++();
			return tmp;
		}

		V& operator*()
		{
			return this->_pNode->_data;
		}

		V* operator->()
		{
			return &(this->_pNode->_data);
		}

		bool operator==(const Self& it) const
		{
			return this->_pNode == it->_pNode;
		}

		bool operator!=(const Self& it)const
		{
			return this->_pNode != it._pNode;
		}

		PNode _pNode;             // 当前迭代器关联的节点
		HashBucket* _pHt;         // 哈希桶--主要是为了找下一个空桶时候方便
	};

	// unordered_set中存储的是K类型，HF哈希函数类型
	// unordered_set在实现时，只需将hashbucket中的接口重新封装即可
	//template<class K, class HF = DefHashF<K>>
	template<class K, class HF = HashFunc<K>>
	class unordered_set
	{
		// 通过key获取value的操作
		struct KeyOfValue
		{
			const K& operator()(const K& data)
			{
				return data;
			}
		};
		typedef OpenHash::HashBucket<K, K, KeyOfValue, HF> HT;
	public:
		typedef typename HT::const_iterator iterator;
		typedef typename HT::const_iterator const_iterator;
		
	public:
		unordered_set() : _ht()
		{}
		
		iterator begin() { return _ht.begin(); }
		iterator end() { return _ht.end(); }

		const_iterator begin()const { return _ht.begin(); }
		const_iterator end()const { return _ht.end(); }
		
		// capacity
		size_t size()const { return _ht.size(); }
		bool empty()const { return _ht.empty(); }
		///
		// lookup
		iterator find(const K& key) { return _ht.Find(key); }
		size_t count(const K& key) { return _ht.Count(key); }
		/
		// modify
		pair<iterator, bool> insert(const K& valye)
		{
			return _ht.Insert(valye);
		}

		iterator erase(iterator position)
		{
			return _ht.Erase(position);
		}
		
		// bucket
		size_t bucket_count() { return _ht.BucketCount(); }
		size_t bucket_size(const K& key) { return _ht.Size(key); }
	private:
		HT _ht;
	};

二、unordered_map的模拟实现

// 为了实现简单，在哈希桶的迭代器类中需要用到hashBucket本身，
	template<class K, class V, class KeyOfValue, class HF>
	class HashBucket;

	// 注意：因为哈希桶在底层是单链表结构，所以哈希桶的迭代器不需要--操作
	template <class K, class V, class KeyOfValue, class HF>
	struct HBIterator
	{
		typedef HashBucket<K, V, KeyOfValue, HF> HashBucket;
		typedef HashBucketNode<V>* PNode;
		typedef HBIterator<K, V, KeyOfValue, HF> Self;

		HBIterator(PNode pNode = nullptr, HashBucket* pHt = nullptr)
			:_pNode(pNode)
			,_pHt(pHt)
		{}

		Self& operator++()
		{
			// 当前迭代器所指节点后还有节点时直接取其下一个节点
			if (_pNode->_pNext)
				_pNode = _pNode->_pNext;
			else
			{
				// 找下一个不空的桶，返回该桶中第一个节点
				size_t bucketNo = _pHt->HashFunc(KeyOfValue()(_pNode->_data)) + 1;
				for (; bucketNo < _pHt->BucketCount(); ++bucketNo)
				{
					if (_pNode = _pHt->_ht[bucketNo])
						break;
				}
			}

			return *this;
		}
		Self operator++(int)
		{
			Self tmp = this;
			this->operator++();
			return tmp;
		}

		V& operator*()
		{
			return _pNode->_data;
		}

		V* operator->()
		{
			return &(_pNode->_data);
		}

		bool operator==(const Self& it) const
		{
			return _pNode == it._pNode;
		}

		bool operator!=(const Self& it) const
		{
			return _pNode != it._pNode;
		}

		PNode _pNode;             // 当前迭代器关联的节点
		HashBucket* _pHt;         // 哈希桶--主要是为了找下一个空桶时候方便
	};

	// unordered_map中存储的是pair<K, V>的键值对，K为key的类型，V为value的类型，HF哈希函数类型
	// unordered_map在实现时，只需将hashbucket中的接口重新封装即可
	template<class K, class V, class HF = HashFunc<K>>
	class unordered_map
	{
		// 通过key获取value的操作
		struct KeyOfValue
		{
			const K& operator()(const pair<K, V>& data)
			{
				return data.first;
			}
		};
		typedef OpenHash::HashBucket<K, pair<K, V>, KeyOfValue, HF> HT;
	public:
		typedef typename HT::iterator iterator;
	public:
		unordered_map() : _ht()
		{}
		
		iterator begin() { return _ht.begin(); }
		iterator end() { return _ht.end(); }
		
		// capacity
		size_t size()const { return _ht.size(); }
		bool empty()const { return _ht.empty(); }
		///
		// Acess
		V& operator[](const K& key)
		{
			pair<iterator, bool> ret = insert(make_pair(key, V()));
			return ret.first->second;
		}
		const V& operator[](const K& key)const
		{
			pair<iterator, bool> ret = insert(make_pair(key, V()));
			return ret.fisrt->second;
		}
		//
		// lookup
		iterator find(const K& key) { return _ht.Find(key); }
		size_t count(const K& key) { return _ht.Count(key); }
		/
		// modify
		pair<iterator, bool> insert(const pair<K, V>& valye)
		{
			return _ht.Insert(valye);
		}

		iterator erase(iterator position)
		{
			return _ht.Erase(position);
		}
		
		// bucket
		size_t bucket_count() { return _ht.BucketCount(); }
		size_t bucket_size(const K& key) { return _ht.BucketSize(key); }
	private:
		HT _ht;
	};

写bug还得是我

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
哈希表及unordered_map、unordered_set的模拟实现

顺序结构及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须经过关键码的多次比较，顺序查找时间复杂度为O(N)，平衡树中为树的高度，即O(log_2 N)，搜索的效率取决于搜索过程中元素的比较次数。如果有一种数据结构，存储的数据与存储的位置呈映射关系，那么不需要进行比较直接通过对应的位置找到映射的数据。其实这种数据结构就叫哈希表或者散列表，插入的数据通过哈希函数得到对应的位置，然后数据就插入在表中对应的位置上，
复制链接

扫一扫