unordered系列模拟实现

最新推荐文章于 2024-09-14 20:30:05 发布

凪よ

最新推荐文章于 2024-09-14 20:30:05 发布

阅读量806

点赞数 31

分类专栏： C++ 文章标签：哈希算法散列表算法 c++ 开发语言

本文链接：https://blog.csdn.net/2302_79331124/article/details/142262992

版权

C++ 专栏收录该内容

33 篇文章 1 订阅

订阅专栏

一、底层结构

unordered系列的概念式容器系效率比较高，其底层使用了哈希结构。

1.1 哈希概念

哈希是一种思想而不是一种数据结构，哈希表是一种数据结构，哈希通过哈希函数是元素的存储位置与它的关键码之间能够建立一种一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

插入元素
根据待插入元素的关键码，依此计算出元素的存储位置，并按照此位置进行存放
搜索元素

对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相同，则搜索成功

以上所述的方式就是哈希方法，哈希方法中使用的转换函数成为哈希函数，构造出来的结构称为哈希表（Hash Table）（或者散列表）。

例如：数据集合 {1， 6， 7， 4， 9， 5}

哈希函数设置为： **hash(key) = key % capacity ** capacity = 10;

hash(1) = 1 % 10 = 1, 放在下标为1 的位置， hash(6) = 6 % 10 = 6, 放在下标为6的位置

hash(7) = 7 % 10 = 7, 放在下标为7 的位置， Hash(4) = 4 % 10 = 4，放在下标为4的位置

hash(9) = 9 % 10 = 9, 放在下标为9 的位置， Hash(5) = 5 % 10 = 5, 放在下标为5的位置

用哈希方法进行搜索不必进行多次关键码的比较，因此搜索速度会比较快。

1.2 哈希冲突

不同关键字通过相同的哈希函数计算出的哈希地址，这种现象就被称为哈希冲突或者哈希碰撞。
这种具有不同关键码但是具有相同哈希地址的数据元素被称之为“同义词”。

1.3 哈希函数

哈希冲突是不可避免的但是，我们可以通过合适的哈希函数减少出现哈希冲突的概率。

哈希函数的设计原则：

哈希函数的定义域必须包含需要存储的全部关键码，如果哈希表允许有n个地址时，它的值域就必须在n 到 n - 1之间。
哈希函数计算出来的地址能均匀分布在整个空间中
哈希函数应设计的比较简单

常见的哈希函数：

直接定址法（常用）
取关键码的某个线性函数作为哈希地址： hash(key) = a * key + b
优点是简单均匀，缺点是需要事先知道关键字的分布情况
除留余数法（常用）
如果哈希表的地址数为n，去一个不大于n，但是最接近或者等于n的质数p作为除数，按照哈希函数：hash(key) = key % p (p <= m), 将关键码转为哈希地址。

注意：哈希函数的设计越巧妙，产生哈希冲突的可能性就越低，但是对于哈希冲突只能尽量减少其出现的概率，是无法避免的。

1.4 哈希冲突的解决方案

解决哈希冲突的两种常见的方法：闭散列和开散列。

1.4.1 闭散列

闭散列：又叫做开放寻地址法如果哈希表没有被装满，说明在哈希表中必然还存在空位置，那么可以把key存放到冲突位置中的下一个空位置中去。

那么我们该如何进行寻找下一个位置呢？

其方法就是线性探测。

线性探测：从发生冲突的位置开始，依次向后进行探测，知道寻找到下一个空位置为止。

插入：
- 通过哈希函数获取待插入元素在哈希表中的位置
- 如果该位置中没有元素则直接插入新元素，如果该位置中有元素发生哈希冲突，就是用线性探测，寻找到下一个位置，插入新元素
删除：
- 采用闭散列处理哈希冲突时，不能随便的物理删除哈希表中已有的元素，若直接删除元素会影响其他元素的搜索，因此在删除时，采用伪删除的方式进行删除，实现方法就是给哈希表的每个空间一个标志， EMPTY代表此位置为空， EXIST代表此位置已有元素，DELETE代表元素已经被删除

线性探测的实现：

namespace open_adress
{
	enum State
	{
		EXIST,
		EMPTY,
		DELETE
	};
	template<class K, class V>
	struct HashData
	{
		pair<K, V> _kv;
		State _state = EMPTY;
	};
	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
	public:
		HashTable()
		{
			_tables.resize(10);
		}
		bool Insert(const pair<K, V>& kv)
		{
			if (Find(kv.first))
				return false;
			//负载因子>=7就进行扩容
			if (_n * 10 / _tables.size() >= 7)
			{
				HashTable<K, V, Hash> newtable;
				newtable._tables.resize(_tables.size() * 2);
				for (size_t i = 0; i < _tables.size(); i++)
				{
					if (_tables[i]._state == EXIST)
					{
						newtable.Insert(_tables[i]._kv);
					}
				}
				//将新表的_tables给到旧表
				_tables.swap(newtable._tables);
			}
			Hash hs;
			size_t hashi = hs(kv.first) % _tables.size();
			while (_tables[hashi]._state == EXIST)
			{
				++hashi;
				hashi %= _tables.size();
			}
			_tables[hashi]._kv = kv;
			_tables[hashi]._state = EXIST;
			++_n;

			return true;
		}
		HashData<K, V>* Find(const K& key)
		{
			Hash hs;
			size_t hashi = hs(key) % _tables.size();
			while (_tables[hashi]._state != EMPTY)
			{
				if ( _tables[hashi]._state == EXIST && _tables[hashi]._kv.first == key)
				{
					return &_tables[hashi];
				}
				hashi++;
				hashi %= _tables.size();
			}
			return nullptr;
		}
		bool Erase(const K& key)
		{
			HashData<K, V>* ptr = Find(key);
			if (ptr == nullptr)
				return false;
			else
			{
				ptr->_state = DELETE;
				--_n;
				return true;
			}
		}
	private:
		vector<HashData<K, V>> _tables;
		size_t _n = 0; //记录元素个数
	};

	//测试代码

	void TestHT1()
	{
		HashTable<int, int> ht;
		int a[] = { 11,21,4,14,24,15,9 };
		for (auto e : a)
		{
			ht.Insert({ e,e });
		}

		ht.Insert({ 19,19 });
		ht.Insert({ 19,190 });
		ht.Insert({ 19,1900 });
		ht.Insert({ 39,1900 });

		cout << ht.Find(24) << endl;
		ht.Erase(4);
		cout << ht.Find(24) << endl;
		cout << ht.Find(4) << endl;
	}

}

线性探测的缺点：一但发生哈希冲突，那么所有的冲突连在一起，容易产生数据堆积，即：不同的关键码占据了可利用的空位置，是得寻找某关键码的位置需要许多次比较，导致搜索效率降低。

1.4.2 开散列

开散列：开散列法又被称为链地址法，首先对关键码集合用散列函数计算散列地址，具有相同的地址的关键码归于同一个集合中，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各个链表的头结点存储在哈希表中。

开散列的每个桶中放的都是发生哈希冲突的元素。

开散列的实现：

namespace bucket
{
	template<class K, class V> 
	struct HashData
	{
		pair<K, V> _kv;
		HashData<K, V>* _next;

		//HashData的构造函数
		HashData(const pair<K, V>& kv)
			:_kv(kv)
			,_next(nullptr)
		{}
	};
	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
		typedef HashData<K, V> Node;
	public:
		HashTable()
		{
			_tables.resize(10, nullptr);
		}



		bool Insert(const pair<K, V>& kv)
		{
			if (Find(kv.first))
				return false;
			Hash hs;
			size_t hashi = hs(kv.first) % _tables.size();
			//负载因子等于1就进行扩容
			if (_n == _tables.size())
			{
				vector<Node*> newtables(_tables.size() * 2, nullptr);
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					while (cur)
					{
						Node* next = cur->_next;
						size_t hashi = hs(cur->_kv.first) % newtables.size();
						cur->_next = newtables[hashi];
						newtables[hashi] = cur;
						cur = next;
					}
					_tables[i] = nullptr;
				}
				_tables.swap(newtables); 
			}
			
			Node* newnode = new Node(kv);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;
			++_n;

			return true;
		}


		Node* Find(const K& key)
		{
			Hash hs;
			size_t hashi = hs(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
					return cur;
				else
					cur = cur->_next;
			}

			return nullptr;
		}

		bool Erase(const K& key)
		{
			Hash hs;
			size_t hashi = hs(key) % _tables.size();
			Node* prev = nullptr;
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					if (prev == nullptr)
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						prev->_next = cur->_next;
					}
					delete cur;
					--_n;
					return true;
				}
				prev = cur;
				cur = cur->_next;
			}

			return false;
		}



	private:
		vector<Node*> _tables;//指针数组,一个指针为一个桶
		size_t _n = 0;
	};


	//测试代码
	void TestHT1()
	{
		HashTable<int, int> ht;
		int a[] = { 11,21,4,14,24,15,9,19,29,39 };
		for (auto e : a)
		{
			ht.Insert({ e,e });
		}


		ht.Insert({ -6, 6 });

		for (auto e : a)
		{
			ht.Erase(e);
		}
	}

	void TestHT2()
	{
		HashTable<string, string> ht;
		ht.Insert({ "sort", "排序" });
		ht.Insert({ "left", "左边" });
	}
}

以上就是unordered系列的底层结构哈希表的实现，那么我们要对unordered系列进行模拟实现的话就要对哈希表进行改造封装。

二、模拟实现

2.1 哈希表的改造

模版参数列表的改造：

template<class K, class T, class KeyOfT, class Hash>
	class HashTable;

增加迭代器操作：

template<class K, class T, class KeyOfT, class Hash>
	class HashTable;

	template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
	struct HashIterator
	{
		typedef HashData<T> Node;
		typedef HashIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;

		Node* _node;
		const HashTable<K, T, KeyOfT, Hash>* _htptr;

		HashIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* htptr)
			: _node(node)
			, _htptr(htptr)
		{}

		Ref operator*()
		{
			return _node->_data;
		}
		Ptr operator->()
		{
			return &_node->_data;
		}
		bool operator==(const Self& v)
		{
			return v._node == _node;
		}
		bool operator!=(const Self& v)
		{
			return v._node != _node;
		}

		Self& operator++()
		{
			if (_node->_next)
			{
				//当前所在桶内还有元素
				_node = _node->_next;
			}
			else
			{
				//寻找下一个不为空的桶
				KeyOfT kot;
				Hash hs;
				size_t hashi = hs(kot(_node->_data)) % _htptr->_tables.size();
				++hashi;
				while (hashi < _htptr->_tables.size())
				{
					if (_htptr->_tables[hashi])
					{
						break;
					}

					++hashi;
				}
				if (hashi == _htptr->_tables.size())
				{
					_node = nullptr; //相当于走到end()
				}
				else
				{
					_node = _htptr->_tables[hashi];
				}
			}

			return *this;
		}
	};

增加通过key获取到value的操作

template<class K, class T, class KeyOfT, class Hash>
	class HashTable
	{
		template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
		friend struct HashIterator;
		typedef HashData<T> Node;
	public:
		typedef HashIterator<K, T, T&, T*, KeyOfT, Hash> Iterator;
		typedef HashIterator<K, T, const T&, const T*, KeyOfT, Hash> ConstIterator;
		HashTable()
		{
			_tables.resize(10, nullptr);
		}
		~HashTable()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				//依次释放每个桶中的元素
				Node* cur = _tables[i];
				while (cur)
				{
					Node* next = cur->_next;
					delete cur;
					cur = next;
				}
				_tables[i] = nullptr;
			}
		}
		Iterator Begin()
		{
			if (_n == 0)
				return Iterator(nullptr, this);
			for (size_t i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				if (cur != nullptr)
				{
					return Iterator(cur, this);
				}
			}

			return Iterator(nullptr, this);
		}
		Iterator End()
		{
			return Iterator(nullptr, this);
		}
		//cosnt迭代器
		ConstIterator Begin() const
		{
			if (_n == 0)
				return ConstIterator(nullptr, this);
			for (size_t i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				if (cur != nullptr)
				{
					return ConstIterator(cur, this);
				}
			}

			return ConstIterator(nullptr, this);
		}

		ConstIterator End() const
		{
			return ConstIterator(nullptr, this);
		}




		pair<Iterator, bool> Insert(const T& data)
		{
			KeyOfT kot;
			Iterator it = Find(kot(data));
			if (it != End())
			{
				return make_pair(it, false);
			}
			Hash hs;
			size_t hashi = hs(kot(data)) % _tables.size();
			//负载因子等于1就进行扩容
			if (_n == _tables.size())
			{
				vector<Node*> newtables(_tables.size() * 2, nullptr);
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					while (cur)
					{
						Node* next = cur->_next;
						size_t hashi = hs(kot(cur->_data)) % newtables.size();
						cur->_next = newtables[hashi];
						newtables[hashi] = cur;
						cur = next;
					}
					_tables[i] = nullptr;
				}
				_tables.swap(newtables);
			}

			Node* newnode = new Node(data);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;
			++_n;

			return make_pair(Iterator(newnode, this), true);
		}


		Iterator Find(const K& key)
		{
			KeyOfT kot;
			Hash hs;
			size_t hashi = hs(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
					return Iterator(cur, this);
				else
					cur = cur->_next;
			}

			return Iterator(nullptr, this);
		}

		bool Erase(const K& key)
		{
			Hash hs;
			size_t hashi = hs(key) % _tables.size();
			Node* prev = nullptr;
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					if (prev == nullptr)
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						prev->_next = cur->_next;
					}
					delete cur;
					--_n;
					return true;
				}
				prev = cur;
				cur = cur->_next;
			}

			return false;
		}



	private:
		vector<Node*> _tables;//指针数组,一个指针为一个桶
		size_t _n = 0;
	};

改造完成我们的哈希表之后，我们就对unordered_map和unordered_set 进行封装操作。

2.2 unordered_map

#pragma once
#include"HashTable.h"

namespace my_unordered_map
{
	template<class K, class V, class Hash = HashFunc<K>>
	class unordered_map
	{
		struct MapKeyOfT
		{
			const K& operator()(const pair<K, V>& kv)
			{
				return kv.first;
			}
		};
	public:
		typedef typename bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::Iterator iterator;
		typedef typename bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::ConstIterator const_iterator;
		//普通迭代器
		iterator begin()
		{
			return _ht.Begin();
		}
		iterator end()
		{
			return _ht.End();
		}
		//const迭代器
		const_iterator begin() const
		{
			return _ht.Begin();
		}
		const_iterator end() const
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}
		iterator find(const K& key)
		{
			return _ht.Find(key);
		}
		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}
		V& operator[](const K& key)
		{
			pair<iterator, bool> tmp = _ht.Insert(make_pair(key, V()));
			return tmp.first->second;
		}

	private:
		bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;
	};

	//测试代码
	void test_map()
	{
		unordered_map<string, string> dict;
		dict.insert({ "sort", "排序" });
		dict.insert({ "left", "左边" });
		dict.insert({ "right", "右边" });
		dict.insert({ "end", "结束" });
		dict.insert({ "right", "右边" });
		dict.insert({ "begin", "开始" });

		dict["left"] = "左边，剩余";
		dict["insert"] = "插入";
		dict["string"];
		dict["end"] = "结束，末尾";
		dict["begin"] = "开始，开头";

		dict["string"] = "字符串";

		unordered_map<string, string>::iterator it = dict.begin();
		while (it != dict.end())
		{
			//it->first += 'x';
			//it->second += 'x';

			cout << it->first << ":" << it->second << endl;
			++it;
		}
		cout << endl;
	}
}

2.3 unordered_set

#pragma once
#include"HashTable.h"
namespace my_unordered_set
{
	template<class K, class Hash = HashFunc<K>>
	class unordered_set
	{
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
	public:
		typedef typename bucket::HashTable<K, const K, SetKeyOfT, Hash>::Iterator iterator;
		typedef typename bucket::HashTable<K, const K, SetKeyOfT, Hash>::ConstIterator const_iterator;
		//普通迭代器
		iterator begin()
		{
			return _ht.Begin();
		}
		iterator end()
		{
			return _ht.End();
		}
		//const迭代器
		const_iterator begin() const
		{
			return _ht.Begin();
		}
		const_iterator end() const
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const K& key)
		{
			return _ht.Insert(key);
		}
		iterator find(const K& key)
		{
			return _ht.Find(key);
		}
		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

	private:
		bucket::HashTable <K, const K, SetKeyOfT, Hash> _ht;
	};

	void Print(const unordered_set<int>& s)
	{
		unordered_set<int>::const_iterator it = s.begin();
		while (it != s.end())
		{
			// *it += 1;
			cout << *it << " ";
			++it;
		}
		cout << endl;
	}

	struct Date
	{
		int _year;
		int _month;
		int _day;

		bool operator==(const Date& d) const
		{
			return _year == d._year
				&& _month == d._month
				&& _day == d._day;
		}
	};

	struct HashDate
	{
		size_t operator()(const Date& key)
		{
			return (key._year * 31 + key._month) * 31 + key._day;
		}
	};

	//测试代码
	void test_set()
	{
		unordered_set<int> s;
		int a[] = { 4, 2, 6, 1, 3, 5, 15, 7, 16, 14, 3,3,15 };
		for (auto e : a)
		{
			s.insert(e);
		}

		for (auto e : s)
		{
			cout << e << " ";
		}
		cout << endl;

		unordered_set<int>::iterator it = s.begin();
		while (it != s.end())
		{
			cout << *it << " ";
			++it;
		}
		cout << endl;

		unordered_set<Date, HashDate> us;
		us.insert({ 2024, 7, 25 });
		us.insert({ 2024, 7, 26 });

		Print(s);
	}
}