哈希表底层探索

最新推荐文章于 2024-04-30 22:52:06 发布

ych9527

最新推荐文章于 2024-04-30 22:52:06 发布

阅读量1k

点赞数 14

分类专栏： C++ 文章标签：闭散列开散列

本文链接：https://blog.csdn.net/ych9527/article/details/117839111

版权

C++ 专栏收录该内容

34 篇文章 0 订阅

订阅专栏

1.unordered_ map、unordered_set

1.1介绍

在C++98之中提供了四个关联式容器，分别是：map、set、multimap、multiset，他们的底层是红黑树，查询时候的效率可以达到longN。但是当数据量非常大的时候，也不是很理想。于是在C++11之中，又增加了四个用法一样的关联式容器，只是底层的实现采用的是哈希表。这四个关联式容器分别是：unordered_map、unordered_set、unordered_multimap、unordered_multiset

它们的底层用的是哈希桶，因此迭代器是单向的

1.2效率对比

#include <iostream>
#include <set>
using namespace std;
#include <time.h>
#include <unordered_set>

int main()
{
	
	srand((unsigned)time(NULL));

	set<int> RBset;
	unordered_set<int> Hash;

	int begin1=clock();
	for (int i = 0; i < 500000;i++)
	{
		RBset.insert(rand() % 1000);
	}
	int end1 = clock();

	int begin2 = clock();
	for (int i = 0; i < 500000; i++)
	{
		Hash.insert(rand() % 1000);
	}
	int end2 = clock();


	cout << "set和unordered_set效率对比" << endl << endl;
	cout << "红黑树set:" << end1 - begin1 << endl << endl;
	cout << "哈希et:" << end2 - begin2 << endl;

	system("pause");
	return 0;
}

1.3实战演练

哈希常见面试题

2.哈希

2.1哈希的概念

map和set底层用的都是红黑树，进行元素的查找和插入都需要经过logN的时间复杂度，即需要通过元素之间的多次比较才能够完成。

理想的搜索方法是，不经过任何比较，一次性直接从表中获取元素。哈希就是这种设计理念，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立起对应的映射关系，在O(1)的时间内，插入、获取元素

向结构中插入元素：
根据待插入元素的关键码，用哈希函数计算出对应的存储位置，并且按照此位置进程存放

搜索结构中的元素：
对元素关键码进行计算，将函数返回值作为元素的存储位置进行搜索，如果对应的位置的元素与搜索元素可以匹配，则搜索成功

这种方法即为哈希(散列)方法，哈希方法中使用的函数称之为哈希函数，构造出来的结构称为哈希表

2.2哈希函数

2.2.1设计原则

哈希函数设计的不合理，会导致哈希冲突，哈希函数一般按照以下几点原则进行设计：

1.哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址时，其值域必须在0到m-1之间

2.哈希函数计算出来的地址能够均匀分布在整个空间之中

3.哈希函数应该比较简单

2.2.2常见哈希函数

2.2.2.1常用哈希函数

1.直接定址法(常用)

取关键字的某个线性函数作为散列地址：hash(key)=A*key + B

优点：简单、均匀
缺点：实现需要知道关键字的分布情况，并且只适合查找比较小，且连续分布的情况

场景：

适用场景：查找字符串中，第一次出现的单词：构建一个数组 hash[ch-‘a’] 即为对应的地址

不适用场景：给一批数据， 1 5 8 100000 像这数据跨度大，数据元素不连续，很容易造成空间浪费

2.除留余数法(常用)

设散列表中允许的地址数为m，通常是取一个不大于m，但是最接近或者等于m的质数num，作为除数，按照哈希函数进行计算hash(key)= key%num, 将关键码转换成哈希地址

除留余数法，最好模一个素数：

const int PRIMECOUNT = 28;
	const size_t primeList[PRIMECOUNT] =
	{
		53ul, 97ul, 193ul, 389ul, 769ul,
		1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
		49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
		1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
		50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
		1610612741ul, 3221225473ul, 429496729ul
	};
	
	size_t GetNextPrime(size_t prime)
	{
		size_t i = 0;
		for (; i <PRIMECOUNT; ++i){
			if (primeList[i] > prime)
				return primeList[i];
		}

		return primeList[i];
	}

2.2.2.2 不常用哈希函数

1.平方取中法

hash(key)=key*key -> 然后取函数返回值的中间的几位，作为哈希地址
比如 25^2 = 625 取中间的一位 2 作为哈希地址

比较适合不知道关键字的分布，而位数又不是很大的情况

2.折叠法

将关键字从左到右分割成位数相等的几部分(最后一部分可以短些)，然后将这几部分叠加求和，并且按照散列表长度，取最后几位作为散列地址

适用于不知道关键字分布，关键字位数比较多的情况

3.随机数法

选取一个随机函数，取关键字的随机函数值，作为它的哈希地址，hash(key) = random(key)，random为随机函数

通常用于关键字长度不等的情况

4.数学分析法

通过实现分析关键字，来获取哈希地址

比如用每个人的手机号码充当关键字，如果采用前三位作为哈希地址，那么冲突的概率是非常大的。如果采用的是中间3位那么冲突的概率要小很多

常用于处理关键字位数比较大的情况，且事前知道关键字的分布和关键字的若干位的分布情况

2.3哈希冲突

解决哈希冲突常见的两种方法是闭散列和开散列

3.闭散列

闭散列也叫做开放地址法，当发生哈希冲突的时候，如果哈希表未被填满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置的"下一个"空位置中去，寻找下一个空位置的方法有线性探测法和二次探测法

3.1线性探测

从发生冲突的位置开始，依次向后探测，直到寻找到下一个位置为止

优点：实现非常简单
缺点：一旦发生哈希冲突，所有的冲突连在一起，容易产生数据"堆积"，即不同关键码占据了可利用的空位置，使得寻找某关键码的位置需要进行多次比较，导致搜索效率降低

插入：

通过哈希函数插入元素在哈希表中的位置，如果发生了哈希冲突，则使用线性探测寻找下一个空位置插入元素

删除：

采用闭散列处理哈希冲突时，不能随便删除哈希表中已有的元素，如果直接删除元素，会影响其他元素的搜索
因此线性探测采用标记的伪删除法来删除下一个元素

3.2闭散列扩容-载荷因子

散列表的载荷因子定义为 α = 填入表中的元素 / 散列表的长度

α是散列表装满程度的标志因子，α越大表明装入表中的元素越多，产生冲突的可能性也就越大，反之填入表中的元素越少，冲突可能性越低，空间利用率也就越低

闭散列：一般将载荷因子控制在 0.7-0.8以下，超过0.8查表时候的缓存不中率会按照指数曲线上升(哈希可能性冲突越大)，因此一般hash库中，都将α设置在0.8以下。 闭散列，千万不能为满，否则在插入的时候会陷入死循环

开散列/哈希桶：一般将载荷因子控制在1。超过1，那么链表就挂得越长，效率也就越低

3.3二次探测

线性探测的缺陷是产生哈希冲突，容易导致冲突的数据堆积在一起，这是因为线性探测是逐个的找下一个空位置
二次探测为了缓解这种问题(不是解决)，对下一个空位置的查找进行了改进(跳跃式查找)：

POS = (H+i^2)%m || POS = (H - i^2)%m
其中：i=1、2、3…
H是发生哈希冲突的位置
m是哈希表的大小

3.4平均查找长度

先将值放入哈希表之中

平均查找次数 = 查找每个值的次数总和/值的个数

3.5模拟实现

设计过程：

1.整体结构：

在这里插入图片描述

2.扩容实现：

3.算法技巧
在这里插入图片描述

4.封装成set、map的代码和验证

5.代码和实验效果

闭散列代码:

#ifndef _HASH_HPP_
#define _HASH_HPP_

using namespace std;
#include <iostream>
#include <vector>
#include <assert.h>
#include <string> 


//闭散列
//set<K> -> HashTable<K,K>
//map<K,V> -> HashTable<K,pair<K,V>>
namespace YCH_CLOSE_HASH
{

	enum State
	{
		EMPTY,//空
		DELETE,//存在
		EXIST//删除
	};

	template<class T>
	struct HashNode//存储的节点
	{
		State _state = EMPTY;//节点状态，默认为空状态（缺省值）
		T _t;//节点的值

		HashNode(const T&t=T())
			:_t(t)
		{}

	};

	//迭代器的构造

	//提前声明哈希表
	template <class K, class T, class KeyofT, class Hash>//声明和定义不能同时候缺省参数
	class HashTable;

	template<class K, class T, class KeyofT, class Hash>
	struct  HashIterator
	{
		typedef HashIterator<K, T, KeyofT, Hash> Self;
		typedef HashNode<T> Node;
		typedef HashTable<K, T, KeyofT, Hash> HashTable;

		Node *_node;
		HashTable *_pht;//迭代器里面还需要包括哈希表


		HashIterator(Node* node, HashTable *pth)
			:_node(node)
			, _pht(pth)
		{}

		T& operator*()
		{
			assert(_node != nullptr);
			return _node->_t;
		}

		T* operator ->()
		{
			assert(_node != nullptr);
			return &(_node->_t);
		}

		bool operator == (const Self &s)const
		{
			return _node == s._node;
		}

		bool operator != (const Self &s)const
		{
			return _node != s._node;
		}

		Self& operator++()//寻找当前位置的后面一个有元素的位置
		{
			//先找到当前节点所在位置，然后往后寻找下一个节点的位置
			KeyofT kf;

			size_t begin = _pht->HashFunc(kf(_node->_t));//得到映射的哈希位置
			//不清楚是否发生冲突因此还要再次寻找
			size_t index = begin;
			size_t i = 1;
			int flag = 1;

			while (_pht->_tables[index]._state != EMPTY)//等于空就停止，这也是为什么要用HashNode的原因，直接判断，如果删除了，也会出现空
			{
				if (_pht->_tables[index]._state == EXIST&&kf(_node->_t) == kf(_pht->_tables[index]._t))//存在且K值对应
				{
					break;
				}
				
				index = (begin + i*i*flag) % _pht->_tables.size();//二次探测

				if (flag == -1)
				{
					i++;//增大值
					flag = 1;
				}
				else
				{
					flag = -1;//变换方向
				}
			}

			//此时index的位置就是第一个当前位置的数据，然后往后遍历整张表，输出元素
			for (int i = index+1; i < _pht->_tables.size(); i++)
			{
				if (_pht->_tables[i]._state == EXIST)
				{
					/*_node->_t = _pht->_tables[i]._t; //BUG,这是将原来节点里面的值也给更改了
					_node->_state = _pht->_tables[i]._state;*/
					
					_node = &(_pht->_tables[i]);
					return *this;
				}
			}

			//return _pht->End();这样不行，没有接受值，只是返回了一个下一个位置的迭代器
			_node = nullptr;
			return *this;
		}

		Self operator++(int)//后置++
		{
			//先找到当前节点所在位置，然后往后寻找下一个节点的位置
			KeyofT kf;

			size_t begin = _pht->HashFunc(kf(_node->_t));//得到映射的哈希位置
			//不清楚是否发生冲突因此还要再次寻找
			size_t index = begin;
			size_t i = 1;
			int flag = 1;

			while (_pht->_tables[index]._state != EMPTY)//等于空就停止，这也是为什么要用HashNode的原因，直接判断，如果删除了，也会出现空
			{
				if (_pht->_tables[index]._state == EXIST&&kf(_node->_t) == kf(_pht->_tables[index]._t))//存在且K值对应
				{
					break;
				}

				index = (begin + i*i*flag) % _pht->_tables.size();//二次探测

				if (flag == -1)
				{
					i++;//增大值
					flag = 1;
				}
				else
				{
					flag = -1;//变换方向
				}
			}

			//此时index的位置就是第一个当前位置的数据，然后往后遍历整张表，输出元素

			Node *copy_node = _node;//保存一份
			for (int i = index + 1; i < _pht->_tables.size(); i++)
			{
				if (_pht->_tables[i]._state == EXIST)
				{
					/*_node->_t = _pht->_tables[i]._t; //BUG,这是将原来节点里面的值也给更改了
					_node->_state = _pht->_tables[i]._state;*/

					_node = &(_pht->_tables[i]);


					return Self(copy_node, _pht);
				}
			}

			//return _pht->End();这样不行，迭代器本身没有改变，只是返回了一个下一个位置的迭代器
			_node = nullptr;
			return Self(copy_node, _pht);
		}


	};

	//哈希表的构造


	size_t GetNextPrime(size_t prime)
	{
		static const int PRIMECOUNT = 28;//给成静态，不用重复生成
		static const size_t primeList[PRIMECOUNT] =
		{
			53ul, 97ul, 193ul, 389ul, 769ul,
			1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
			49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
			1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
			50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
			1610612741ul, 3221225473ul, 429496729ul
		};

		size_t i = 0;
		for (; i <PRIMECOUNT; ++i){
			if (primeList[i] > prime)
				return primeList[i];
		}

		return primeList[i];
	}

	template<class K, class T, class KeyofT, class Hash>//默认给int进行比较
	class HashTable
	{
	public:
		//构造迭代器
		typedef HashIterator<K, T, KeyofT, Hash> Iterator;
		friend  Iterator; //<=>HashIterator<K, T, KeyofT, Hash>;//将迭代器声明为，哈希表的友元类，即可以访问哈希表的私有成员

		Iterator Begin()
		{
			for (int i = 0; i < _tables.size(); i++)//找到第一个不为空的值
			{
				if (_tables[i]._state == EXIST)
					return Iterator(&_tables[i], this);
			}
			return End();
		}

		Iterator End()
		{
			return Iterator(nullptr, this);
		}

		//构造插入函数

		size_t HashFunc(const K& key)//哈希函数
		{
			Hash hf;
			return hf(key) % _tables.size();
		}

		pair<Iterator,bool> Insert(const T & t)
		{
			//判断要插入的元素是否已经存在
			KeyofT kf;
			Iterator ret = Find(kf(t));
			if (ret!=End())//已经存在了,multiset/multimap则不需要
				return make_pair(ret,false);
				
			//进行扩容检测
			if (_size == 0 || (_size / _tables.size() * 10 > 7))//当前个数为0或者载荷因子超过了,则进行扩容
			{
				//size_t newsize = _size == 0 ? 10 : 2 * _tables.size();//初始化给10，后续扩容两倍

				//选取素数
				size_t newsize = GetNextPrime(_tables.size());

				//扩容之后，需要重新计算元素的位置

				HashTable<K, T, KeyofT, Hash> newtable;
				newtable._tables.resize(newsize);

				for (auto&e : _tables)
				{
					if (e._state == EXIST)
						newtable.Insert(e._t);
				}
				_tables.swap(newtable._tables);//进行交换
			}


			//查找插入的位置

			//KeyofT kf;//获取元素类型
			//Hash hf;//将元素转为整形

			size_t begin = HashFunc(kf(t));//获取映射位置
			size_t index = begin;
			size_t i = 1;
			int flag = 1;
			while (_tables[index]._state == EXIST)//发生冲突，则继续寻找
			{
				index = (begin + i*i*flag) % _tables.size();//二次探测

				if (flag == -1)
				{
					i++;//增大值
					flag = 1;
				}
				else
				{
					flag = -1;//变换方向
				}
			}

			//此时已经找到位置了，进行元素的添加
			
			_tables[index]._t = t;
			_tables[index]._state = EXIST;
			_size++;

			return make_pair(Iterator(&_tables[index],this),true);
		}

		Iterator Find(const K& key)//查找的时候需要注意，查找的值不一定存在
		{
			if (_size == 0)//为空
				return Iterator(nullptr,this);

			//Hash hf;//转整形
			KeyofT kf;//拿K值

			size_t begin = HashFunc(key);//转为整形，获取映射位置.
			size_t index = begin;
			size_t i = 1;
			int flag = 1;

			while (_tables[index]._state != EMPTY)//等于空就停止，这也是为什么要用HashNode的原因，直接判断，如果删除了，也会出现空
			{
				if (_tables[index]._state == EXIST&&key == kf(_tables[index]._t))//存在且K值对应
				{
					return Iterator(&_tables[index],this);
				}

				index = (begin + i*i*flag) % _tables.size();//二次探测

				if (flag == -1)
				{
					i++;//增大值
					flag = 1;
				}
				else
				{
					flag = -1;//变换方向
				}
			}
			//当前值不存在
			return End();
		}

		bool Erase(const K& key)//先找到再删除
		{
			HashNode<T>* node = Find(key);
			if (node)
			{
				node->_state = DELETE;//伪删除
				_size--;
				return true;
			}
			return false;
		}


	private:
		vector<HashNode<T>> _tables;//底层结构
		size_t _size = 0;//存储的数据的个数
	};
};


#endif

封装：

map封装：

#pragma once
#include "hash.hpp"


namespace YCH_MAP
{
	//内置哈希转换函数 <-> 常用int 和 string 
	//如果K类型不支持取模，就需要配上一个仿函数来进行使用
	template<class K>
	struct Hash
	{
		size_t operator() (const K&key)
		{
			return key;
		}
	};

	//string类型常用，进行特化
	template<>
	struct Hash<string>
	{
		size_t operator() (const string &key)
		{
			size_t count = 0;
			for (auto&e : key)
			{
				count = count * 131 + e;// 字符串转整形求哈希地址常用值131，可以减少冲突
			}
			return count;
		}
	};

	template<class K, class V, class hash = Hash<K>>
	class ych_unordered_map
	{

	private:
		struct map_KeyofT
		{
			const K& operator()(const pair<const K, V>& kv)
			{
				return kv.first;
			}
		};

	public:
		typedef typename YCH_CLOSE_HASH::HashTable<K, pair<K, V>, map_KeyofT, hash>::Iterator iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}


	private:
		YCH_CLOSE_HASH::HashTable<K, pair<K, V>, map_KeyofT, hash> _ht;

	};
};

set封装：

#pragma once
#include "hash.hpp"


namespace YCH_MAP
{
	
	template<class K, class hash = Hash<K>>
	class ych_unordered_set
	{

	private:
		struct set_KeyofT
		{
			const K& operator()(const K& k)
			{
				return k;
			}
		};

	public:
		typedef typename YCH_CLOSE_HASH::HashTable<K, K, set_KeyofT, hash>::Iterator iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const K& k)
		{
			return _ht.Insert(k);
		}

		iterator find(const K &k)
		{
			return _ht.Find(k);
		}


	private:
		YCH_CLOSE_HASH::HashTable<K, K, set_KeyofT, hash> _ht;

	};
};

实验检测：

#include "hash.hpp"
#include "hashmap.hpp"
#include "hashset.hpp"


void test_map()
{

	YCH_MAP::ych_unordered_map<int, int> map;



	map.insert(make_pair(1, 1));
	map.insert(make_pair(2, 2));
	map.insert(make_pair(3, 3));
	map.insert(make_pair(4, 4));

	for (auto&e : map)
	{
		cout << e.first << " " << e.second << endl;
	}

	cout << "_______测试2_______" << endl;

	YCH_MAP::ych_unordered_map<string, string> map2;
	map2.insert(make_pair("苹果", "好吃"));
	map2.insert(make_pair("香蕉", "bu好吃"));
	map2.insert(make_pair("哈密瓜", "还可以"));
	map2.insert(make_pair("凤梨", "也还可以"));
	map2.insert(make_pair("水蜜桃", "也还不错"));

	for (auto&e : map2)
	{
		cout << e.first << " " << e.second << endl << endl;
	}
}

void test_set()
{
	cout << "-------set封装测试----" << endl;
	YCH_MAP::ych_unordered_set<int> set;

	set.insert(1);
	set.insert(12);
	set.insert(13);
	set.insert(10);
	set.insert(8);

	for (auto &e : set)
	{
		cout << e << endl;
	}

	cout << "_____查找10，并且输出____" << endl;
	YCH_MAP::ych_unordered_set<int>::iterator it = set.find(10);
	if (it != set.end())
		cout << *it << endl;

}
int main()
{
	test_map();
	test_set();



	system("pause");
	return 0;
}

实验效果：

4.开散列

4.1开散列概念

开散列又名哈希桶/开链法，首先对关键码集合采用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表串联起来，各个链表的头节点存储在哈希表中

4.2开散列扩容

桶的个数是一定的，不断的插入元素，会导致单个的桶的长度很长，影响哈希表的性能，理想的情况下是每个桶下面只有一个节点。哈希桶的载荷因子控制在1，当大于1的时候就进行扩容，这样平均下来，每个桶下面只有一个节点；

**与开散列进行比较：**看起来哈希桶之中存储节点的指针开销比较大，其实不然。开散列的载荷因子保证小于0.7，来确保有足够的空间降低哈希冲突的概率，而表项的空间消耗远远高于指针所占的空间效率，因此哈希桶更能节省空间

4.3总结(开散列和闭散列的比较)

**1.开散列的载荷因子：**α<=1，即平均每个桶下面挂一个节点，平均时间复杂度为1

2.开散列产生哈希冲突时：直接头插至链表之中，而闭散列就会出现哈希冲突，容易出现踩踏效应(二次探测也只是缓解这种情况)

3.开散列的优缺点：

优点：
不同位置冲突时，不再互相干扰，载荷因子一般控制在1

缺点：
迭代器遍历输出的时候，不是有序输出的。

**延伸：**要想做到有序输出，那么必须再用个list保存一份，哈希表里面存个地址指向对应的list，输出的时候就输出list里面的内容，但是这样的空间和时间消耗的代价就更大了

**4.开散列的优化：**如果所有的数据都冲突到一个桶下面了，怎么办？

1.在桶下面挂红黑树：极限也是lgN的时间复杂度，但是也只是这一会而已，当增容的时候，这种现象就会缓解

2.多阶哈希：多个哈希表，冲突的时候，挂到另外一个哈希表上，长度不一样，对应的位置就不一样

5.开散列增容那一下的性能

4.4代码实现以及效果验证

#ifndef _HASH_HPP_
#define _HASH_HPP_

using namespace std;
#include <iostream>
#include <vector>
#include <assert.h>
#include <string> 

static size_t GetNextPrime(size_t prime)
{
	static const int PRIMECOUNT = 28;//给成静态，不用重复生成
	static const size_t primeList[PRIMECOUNT] =
	{
		53ul, 97ul, 193ul, 389ul, 769ul,
		1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
		49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
		1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
		50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
		1610612741ul, 3221225473ul, 429496729ul
	};

	size_t i = 0;
	for (; i <PRIMECOUNT; ++i){
		if (primeList[i] > prime)
			return primeList[i];
	}

	return primeList[i];
}

namespace YCH_OPEN_HASH
{
	//节点
	template<class T>
	struct HashLink
	{
		HashLink<T> *_next;
		T _t;

		HashLink(const T& t)
			:_t(t)
			,_next(nullptr)
		{}
	};

	//前置声明
	template<class K, class T, class KeyofT, class Hash>
	class HashTable;

	//迭代器
	template<class K, class T,class Ref,class Ptr ,class KeyofT, class Hash>
	struct HashIterator
	{
		typedef HashLink<T> Node;
		typedef HashTable<K, T, KeyofT, Hash> HashTable;
		typedef HashIterator<K, T, Ref, Ptr, KeyofT,Hash> Self;

		Node *_node;//节点
		HashTable *_pht;//哈希表指针,++的时候需要计算位置

		HashIterator(Node *node,HashTable* tables)//构造函数需要传入节点指针，和哈希表指针
			:_node(node)
			, _pht(tables)
		{}

		Ref operator*()
		{
			assert(_node);
			return _node->_t;
		}

		Ptr operator->()
		{
			assert(_node);
			return &(_node->_t);//返回去的是一个地址，使用的时候编译器优化，减少了一个箭头
		}

		KeyofT kf;
		Self &operator++()//前置
		{
			size_t pos = _pht->HashFunc(kf(_node->_t),_pht->_tables);//获取当前位置
			pos++;
			_node = _node->_next;
			if (_node == nullptr)//当前链表走完了，寻找下一个节点
			{
				for (int i = pos; i < _pht->_tables.size(); i++)
				{
					if (_pht->_tables[i] != nullptr)
					{
						_node = _pht->_tables[i];
						break;
					}
				}
			}
			return *this;
		}

		bool operator!=(Self &s)
		{
			return _node != s._node;
		}

	};



	template<class K, class T, class KeyofT,class Hash>
	class HashTable
	{
	public:
		typedef HashLink<T> Node;
		typedef HashIterator<K, T, T&, T*, KeyofT, Hash> Iterator;
		typedef HashIterator<K, T, const T&, const T*, KeyofT, Hash> Const_Iterator;
		
		//迭代器友元
		friend HashIterator<K, T, T&, T*, KeyofT, Hash>;
		friend Const_Iterator;


		Iterator Begin()
		{
			for (int i = 0; i < _tables.size(); i++)
			{
				if (_tables[i] != nullptr)
					return Iterator(_tables[i], this);
			}
			return Iterator(nullptr, this);
		}

		Iterator End()
		{
			return Iterator(nullptr, this);
		}


		KeyofT kf;//提取key值

		//哈希函数
		size_t HashFunc(const K& key,const vector<HashLink<T>*> tables)
		{
			Hash hf;
			return hf(key) % tables.size();
		}

		pair<Iterator,bool> Insert(const T& t)
		{
			//判断是否存在
			if (_tables.size())//防止%0
			{
				Iterator fi = Find(kf(t));
				if (fi != End())//存在
					return make_pair(fi, false);
			}

			//判断是否需要扩容
			if (_size == _tables.size())//α<=1
			{
				size_t newsize = GetNextPrime(_tables.size());
				vector<Node*> newtables(newsize, nullptr);//构造一个新表出来

				for (int i = 0; i < _tables.size(); i++)
				{
					Node* node = _tables[i];
					while (node)//当前位置有节点
					{
						Node *next = node->_next;//保存当前链表的下一个位置

						size_t index = HashFunc(kf(node->_t),newtables);//得到位置
						node->_next = newtables[index];
						newtables[index] = node;

						node = next;
					}
					_tables[i] = nullptr;//原表置空
				}

				//两表交换
				newtables.swap(_tables);
			}

			//插入节点

			size_t index = HashFunc(kf(t),_tables);//寻找插入位置
			Node *newnode = new Node(t);//构造一个节点
			newnode->_next = _tables[index];//头插
			_tables[index] = newnode;
			_size++;

			return make_pair(Iterator(_tables[index], this), true);
		}

		//查找
		Iterator Find(const K& k)
		{
			size_t index=HashFunc(k, _tables);

			Node *cur = _tables[index];
			while (cur&&(kf(cur->_t) !=k))
			{
				cur = cur->_next;
			}
			return Iterator(cur, this);
		}

		删除
		bool Erase(const K& k)
		{
			size_t intdex = HashFunc(k, _tables);
			Node* cur = _tables[index];
			Node *prev = nullptr;
			while (cur&&kf(cur->_t) != k)
			{
				prev = cur;
				cur = cur->_next;
			}
			if (cur == nullptr)//没找到
				return false;

			prev->_next = cur->_next;
			delete cur;
			_size--;
			return true;
		}

	private:
		vector<Node*> _tables;
		size_t _size;
	};
};

#endif

#pragma once
#include "hash.hpp"


namespace YCH_MAP
{
	//内置哈希转换函数 <-> 常用int 和 string 
	//如果K类型不支持取模，就需要配上一个仿函数来进行使用
	template<class K>
	struct Hash
	{
		size_t operator() (const K&key)
		{
			return key;
		}
	};

	//string类型常用，进行特化
	template<>
	struct Hash<string>
	{
		size_t operator() (const string &key)
		{
			size_t count = 0;
			for (auto&e : key)
			{
				count = count * 131 + e;// 字符串转整形求哈希地址常用值131，可以减少冲突
			}
			return count;
		}
	};

	template<class K, class V, class hash = Hash<K>>
	class ych_unordered_map
	{

	private:
		struct map_KeyofT
		{
			const K& operator()(const pair<const K, V>& kv)
			{
				return kv.first;
			}
		};

	public:
		typedef typename YCH_OPEN_HASH::HashTable<K, pair<const K, V>, map_KeyofT, hash>::Iterator iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const pair<const K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		V& operator[](const K& k)
		{
			return ((insert(make_pair(k, V()))).first)->second;
		}


	private:
		YCH_OPEN_HASH::HashTable<K, pair<const K, V>, map_KeyofT, hash> _ht;

	};
};

#pragma once
#include "hash.hpp"


namespace YCH_MAP
{
	
	template<class K, class hash = Hash<K>>
	class ych_unordered_set
	{

	private:
		struct set_KeyofT
		{
			const K& operator()(const K& k)
			{
				return k;
			}
		};

	public:
		//typedef typename YCH_CLOSE_HASH::HashTable<K, K, set_KeyofT, hash>::Iterator iterator;闭散列
		typedef typename YCH_OPEN_HASH::HashTable<K, K, set_KeyofT, hash>::Iterator iterator;//开散列


		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const K& k)
		{
			return _ht.Insert(k);
		}

		iterator find(const K &k)
		{
			return _ht.Find(k);
		}


	private:
		//YCH_CLOSE_HASH::HashTable<K, K, set_KeyofT, hash> _ht;
		YCH_OPEN_HASH::HashTable<K, K, set_KeyofT, hash> _ht;


	};
};

#include "hash.hpp"
#include "hashmap.hpp"
#include "hashset.hpp"

void test_open_map()
{
	cout << "____YCH_MAP::ych_unordered_map<int, int> map___" << endl ;
	YCH_MAP::ych_unordered_map<int, int> map;
	
	map.insert(make_pair(1, 1));
	map.insert(make_pair(54, 54));
	map.insert(make_pair(55, 55));
	map.insert(make_pair(56, 56));
	map.insert(make_pair(54, 54));
	map.insert(make_pair(108, 108));

	auto it = map.begin();

	while (it != map.end())
	{
		cout << it->first << " " << it->second << endl;
		++it;
	}
	cout << endl;


	cout << "____[]测试____<<endl";
	map[500]++;
	map[200]=100;
	map[111] = 153;

	for (auto&e : map)
	{
		cout << e.first<<" "<<e.second<< endl;
	}
}

void test_open_set()
{
	cout << "_____YCH_MAP::ych_unordered_set<string> set______ " << endl;
	YCH_MAP::ych_unordered_set<string> set;
	set.insert("排序");
	set.insert("字符串");
	set.insert("算法");
	set.insert("算法");
	set.insert("字符串");
	set.insert("哈希表");

	for (auto&e : set)
	{
		cout << e << endl;
	}
}

int main()
{
	test_open_map();	
	test_open_set();



	system("pause");
	return 0;

}

在这里插入图片描述

ych9527

关注

14
点赞
踩
12

收藏

觉得还不错? 一键收藏
11
评论
哈希表底层探索

文章目录1.unordered_ map、unordered_set1.1介绍1.2效率对比1.3实战演练2.哈希2.1哈希的概念2.2哈希函数2.2.1设计原则2.2.2常见哈希函数2.2.2.1常用哈希函数2.2.2.2 不常用哈希函数2.3哈希冲突3.闭散列3.1线性探测3.2闭散列扩容-载荷因子3.3二次探测3.4平均查找长度3.5模拟实现4.开散列4.1开散列概念4.2开散列扩容4.3总结(开散列和闭散列的比较)4.4代码实现以及效果验证1.unordered_ map、unordered_se
复制链接

扫一扫