秒懂C++之哈希

玛丽亚后

于 2024-09-01 19:54:37 发布

阅读量850

点赞数 14

分类专栏： c++ 文章标签：哈希算法算法 c++

本文链接：https://blog.csdn.net/fax_player/article/details/141514959

版权

c++ 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

四. 封装unordered_map/_set

一. 哈希的概念

如果构造一种存储结构，通过某种函数 (hashFunc) 使元素的存储位置与它的关键码之间能够建立

一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

当向该结构中：

插入元素
根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放
搜索元素
对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

该方式即为哈希 ( 散列 ) 方法， 哈希方法中使用的转换函数称为哈希 ( 散列 ) 函数，构造出来的结构称

为哈希表 (Hash Table)( 或者称散列表 )

例如：数据集合{1，7，6，4，5，9}；

哈希函数设置为：hash(key) = key % capacity; capacity为存储元素底层空间总的大小。

这种除留余数法虽然可以解决空间浪费的问题，但是却有不同的值映射到相同位置的风险~

二. 哈希冲突

当我们再插入数据14映射时会把原本位置上的4覆盖掉，造成哈希冲突~

三. 哈希冲突解决方法

闭散列

线性探测：从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止。

插入是解决了，那么删除的问题呢？

下面的状态标记帮我们解决了问题~
//设置状态
enum State
{
	EMPTY,
	EXIST,
	DELETE
};

template<class K,class V>
struct HashDate
{
	pair<K, V> _data;
	State _state = EMPTY;
};

template<class K, class V>
class HashTable
{
public:


private:
	vector<HashDate<K,V>> _table;
};
当遇到删除的标记，查找则继续往下遍历~

插入

    HashTable(size_t size = 10)
	{
		_table.resize(size);
	}
	bool insert(const pair<K, V>& kv)
	{
		//控制在载荷因子允许范围内
		if ((n * 10 / _table.size()) >= 7)
		{
			//开始扩容
			size_t newsize = _table.size() * 2;

		}
		//要映射的下标位置
		size_t hashi = kv.first % _table.size();
		//判断当前映射的下标位置的状态
		while(_table[hashi]->_state == EXIST)
		{
			hashi++;
			//到尾部时能够返回到头部
			hashi%= _table.size();
		}
		//当前映射下标可以插入
		_table[hashi]._data = kv;
		_table[hashi]._state = EXIST;
		n++;
	}
ps:这里取模我们只能用size而不能用capacity

如果用capacity取模会插入在size之外的下标，而vector的[ ]特性中对会>size的越界访问进行报错，所以我们只好把size近似当作capacity来处理了(初始化别给0即可）~

下面我们再从插入的基础上思考扩容问题~

扩容

    bool insert(const pair<K, V>& kv)
	{
        //如果已存在，无法插入
		if (Find(kv.first))
		{
			return false;
		}
		//控制在载荷因子允许范围内
		if ((n * 10 / _table.size()) >= 7)
		{
			//开始扩容
			size_t newsize = _table.size() * 2;
			vector<HashDate> _newtable(newsize);
			//遍历旧表，插入新表
			//..........
			_table.swap(_newtable);
		}
		//要映射的下标位置
		size_t hashi = kv.first % _table.size();
		//判断当前映射的下标位置的状态
		while(_table[hashi]->_state == EXIST)
		{
			hashi++;
			//到尾部时能够返回到头部
			hashi%= _table.size();
		}
		//当前映射下标可以插入
		_table[hashi]._data = kv;
		_table[hashi]._state = EXIST;
		n++;
	}

这种扩容方式的麻烦之处就在于遍历旧表插入新表时还得按下面插入的内容再走一次~

bool insert(const pair<K, V>& kv)
	{
        //如果已存在，无法插入
		if (Find(kv.first))
		{
			return false;
		}
		//控制在载荷因子允许范围内
		if ((n * 10 / _table.size()) >= 7)
		{
			//开始扩容
			HashTable<K, V> newHT(_table.size() * 2);
			//遍历旧表，插入新表
			for (auto& e : _table)
			{
				//复用
				newHT.insert(e._data);
			}
			_table.swap(newHT._table);
		}
		//要映射的下标位置
		size_t hashi = kv.first % _table.size();
		//判断当前映射的下标位置的状态
		while(_table[hashi]._state == EXIST)
		{
			hashi++;
			//到尾部时能够返回到头部
			hashi%= _table.size();
		}
		//当前映射下标可以插入
		_table[hashi]._data = kv;
		_table[hashi]._state = EXIST;
		n++;
	}

这里我们重新生成一个对象，然后设置该对象内_table的size为旧表的两倍，这样遍历旧表插入新表的时候就可以用成员函数insert进行复用，最后再交换数据即可~

第一种方式只是在一个对象内生成另一个vector，然后对象内部进行交换。而第二种是有两个对象，跨对象的数据进行交换~区别在于后者可以使用其成员函数，减少代码冗余~

寻找

    HashDate<K, V>* Find(const K& key)
	{
		//要映射的下标位置
		size_t hashi = key % _table.size();
		//判断当前映射的下标位置的状态
		while (_table[hashi]._state != EMPTY)
		{
			if (_table[hashi]._data.first == key && _table[hashi]._state == EXIT)
			{
				return &_table[hashi];
			}
			hashi++;
			//到尾部时能够返回到头部
			hashi %= _table.size();
		}
		return nullptr;
	}

由于我们还没有模拟实现迭代器，这里我们利用Find函数来帮助我们测试插入效果~

删除

bool erase(const K& key)
	{
		//利用find快速查找
		HashDate<K, V>* ret = Find(key);
		if (ret)
		{
			ret->_state = DELETE;
			n--;
			return true;
		}
		else
		{
			return false;
		}
	}

扩展

通常我们是利用整型key取模来映射下标位置，那如果我们的key值变成string或自定义类（日期类）的时候我们是无法把它们强制转化为整型去取模的，这时候应该如何去做呢？

我们可以给哈希表再加上一层映射，通过映射让可以强转为整型的key值转化为可以被取模的整型，让无法被强转为整型的key值进行特殊处理~

那么关于字符串我们应该采用哪种方式进行最终的取模呢？

一般是让各个字符的ascll码值相加，这样就可以确保每个字符串的唯一性的同时还可以转化为整型进行取模映射下标位置~
//不可强转
struct HashString
{
	size_t operator()(const string& s)
	{
		size_t hash = 0;
		for (auto e : s)
		{
			hash += e;
			//*5是为了避免“abcd","acbd"这种例子的发生
			hash *= 5;
		}
		return hash;
	}
};
不过对于string这种采用的key值，我们一般用另一种方式进行转化~
// 特化
template<>
struct HashFunc<string>
{
	size_t operator()(const string& s)
	{
		size_t hash = 0;
		for (auto e : s)
		{
			hash += e;
			hash *= 5;
		}

		return hash;
	}
};
可以利用特化的方式进行更准确的参数匹配~

闭散列代码

#pragma once

namespace close
{

//设置状态
enum State
{
	EMPTY,
	EXIST,
	DELETE
};

template<class K,class V>
struct HashDate
{
	pair<K, V> _data;
	State _state = EMPTY;
};

template<class K>
//可强转
struct HashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

// 特化
template<>
struct HashFunc<string>
{
	size_t operator()(const string& s)
	{
		size_t hash = 0;
		for (auto e : s)
		{
			hash += e;
			hash *= 5;
		}

		return hash;
	}
};

//
不可强转
//struct HashString
//{
//	size_t operator()(const string& s)
//	{
//		size_t hash = 0;
//		for (auto e : s)
//		{
//			hash += e;
//			//*5是为了避免“abcd","acbd"这种例子的发生
//			hash *= 5;
//		}
//		return hash;
//	}
//};

template<class K, class V,class Hash = HashFunc<K>>
class HashTable
{
public:
	HashTable(size_t size = 10)
	{
		_table.resize(size);
	}

	bool erase(const K& key)
	{
		//利用find快速查找
		HashDate<K, V>* ret = Find(key);
		if (ret)
		{
			ret->_state = DELETE;
			n--;
			return true;
		}
		else
		{
			return false;
		}
	}

	HashDate<K, V>* Find(const K& key)
	{
		Hash hs;
		// 线性探测
		size_t hashi = hs(key) % _table.size();
		//判断当前映射的下标位置的状态
		while (_table[hashi]._state != EMPTY)
		{
			if (_table[hashi]._data.first == key && _table[hashi]._state == EXIST)
			{
				return &_table[hashi];
			}
			hashi++;
			//到尾部时能够返回到头部
			hashi %= _table.size();
		}
		return nullptr;
	}


	bool insert(const pair<K, V>& kv)
	{
		//如果已存在，无法插入
		if (Find(kv.first))
		{
			return false;
		}
		//控制在载荷因子允许范围内
		if ((n * 10 / _table.size()) >= 7)
		{
			开始扩容
			//size_t newsize = _table.size() * 2;
			//vector<HashDate> _newtable(newsize);
			遍历旧表，插入新表
			..........
			//_table.swap(_newtable);
			// 
			//开始扩容
			HashTable<K, V,Hash> newHT(_table.size() * 2);
			//遍历旧表，插入新表
			for (auto& e : _table)
			{
				//复用
				newHT.insert(e._data);
			}
			_table.swap(newHT._table);
		}
		Hash hs;
		//要映射的下标位置
		size_t hashi = hs(kv.first) % _table.size();
		//判断当前映射的下标位置的状态
		while(_table[hashi]._state == EXIST)
		{
			hashi++;
			//到尾部时能够返回到头部
			hashi%= _table.size();
		}
		//当前映射下标可以插入
		_table[hashi]._data = kv;
		_table[hashi]._state = EXIST;
		n++;
	}

private:
	vector<HashDate<K,V>> _table;
	//有效个数
	size_t n;

};
void test1()
{
	int a[] = { 1,4,24,34,7,44,17,37 };
	HashTable<int, int> ht;
	for (auto e : a)
	{
		ht.insert(make_pair(e, e));
	}
	ht.erase(1);
	ht.erase(34);

	for (auto e : a)
	{
		auto ret = ht.Find(e);
		if (ret)
		{
			cout << ret->_data.first << ":E" << endl;
		}
		else
		{
			cout << e<< ":D" << endl;
		}
	}
	cout << endl;

}
void test2()
{
	HashTable<string, string> dict;
	dict.insert(make_pair("sort", "排序"));
	dict.insert(make_pair("string", "字符串"));

}
}

开散列

开散列法又叫链地址法 ( 开链法 ) ，首先对关键码集合用散列函数计算散列地址，具有相同地

址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链

接起来，各链表的头结点存储在哈希表中 。

namespace open
{
	template<class K, class V>
	struct HashNode
	{
		pair<K, V> _kv;
		HashNode<K, V>* _next;
        HashNode(const pair<K, V>& kv)
			:_next(nullptr)
			,_kv(kv)
		{}
	};
	template<class K,class V>
	class HashTable
	{
	public:

	private:
		//直接套用list
		//vector<list<pair<K, V>>> _table;
		//自己写一个,指针数组
		vector<HashNode<K,V>*> _table;
        size_t n;

	};
}

这里我们不直接用容器list的原因是方便自己写一个迭代器

插入

    bool insert(const pair<K,V>& kv)
		{
			size_t hashi = kv.first % _table.size();
			Node* newnode = new Node(kv);
			//头插
			newnode->_next = _table[hashi];
			_table[hashi] = newnode;
            n++;
			return true;
		}

寻找

        //寻找          
		//我们需要查到的是值，而这些值在节点中
		Node* Find(const K& key)
		{
			//映射下标
			size_t hashi = key % _table.size();
			Node* cur = _table[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					return cur;
				}
				cur = cur->_next;
			}
			return nullptr;
		}

扩容

//插入
		bool insert(const pair<K,V>& kv)
		{
			if (Find(kv.first))
			{
				return false;	
			}
			//负载因子为1
			//扩容
			if (n == _table.size())
			{
				vector<Node*> _newtable(_table.size()*2,nullptr);
				for (size_t i = 0; i < _table.size(); i++)
				{
					//遍历每一个桶
					Node* cur = _table[i];
					//把桶里的节点都移动到新表中
					while (cur)
					{
						//保存好下一节点
						Node* next = cur->_next;
						//映射当前节点在新表中的位置
						size_t hashi = cur->_kv.first % _newtable.size();
						//头插
						cur->_next = _newtable[hashi];
						_newtable[hashi] = cur;
						//对该桶下一节点进行转移
						cur = next;
					}
					//节点全部转移完毕，清空旧表指针指向
					_table[i] = nullptr;
				}
				//两表交换完成扩容
				_table.swap(_newtable);
			}
			size_t hashi = kv.first % _table.size();
			Node* newnode = new Node(kv);
			//头插
			newnode->_next = _table[hashi];
			_table[hashi] = newnode;
			n++;
			return true;

		}

这一次的扩容与线性探测不太一样，之前我们为了方便插入到新表选择重新构成一个类对象来复用Insert,而这一次是为了避免节点的浪费转而采用移动节点的方式进行插入新表~

删除

//删除
		bool erase(const K& key)
		{
			//找到要删除的映射位置(桶）
			size_t hashi = key % _table.size();
			//遍历当前桶节点
			Node* cur = _table[hashi];
			//保留前一个节点，使其删除后链接
			Node* prev = nullptr;
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					//若非头删
					if (prev)
					{
						prev->_next = cur->_next;
					}
					//若恰好头删
					else
					{
						_table[hashi] = cur->_next;
					}
					//节点删除
					delete cur;
					n--;
					return true;
				}
				else
				{
					prev = cur;
					cur = cur->_next;
				}
			}
			return false;
		}

删除我们得考虑在这之后节点之间的链接~

扩展

最后我们再来写一下关于字符串取模的类模板~

    //可强转
	template<class K>
	struct HashFunc
	{
		size_t operator()(const K& key)
		{
			return (size_t)key;
		}
	};

	// 特化
	template<>
	struct HashFunc<string>
	{
		size_t operator()(const string& s)
		{
			size_t hash = 0;
			for (auto e : s)
			{
				hash += e;
				hash *= 5;
			}

			return hash;
		}
	};
	template<class K,class V,class Hash = HashFunc<K>>
	class HashTable...