哈希及其模拟

樂丶x

已于 2022-06-05 15:34:01 修改

阅读量147

点赞数

分类专栏： C++ 文章标签：哈希算法散列表数据结构

于 2022-05-24 23:42:39 首次发布

本文链接：https://blog.csdn.net/m0_46766926/article/details/124956092

版权

C++ 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

哈希及其模拟

1、概念
2、闭散列
3、闭散列线性探测模拟
4、开散列
- 4.1扩容
5、开散列模拟实现
6、哈希桶迭代器
7、封装unordered_set
8、封装unordered_map

1、概念

1.1哈希

在线性表和平衡树中，元素的存储位置是随机的，即元素的存储位置与其关键码之间没有关系，因此在查找一个元素时，必须要经过关键码的多次比较。线性查找的时间复杂度为O(N)，平衡树中查找效率为树的高度，即O(log2（N))，搜索的效率取决于搜索过程中元素的比较次数。最优情况下，通过元素与存储位置之间的函数关系存储元素，使元素的存储位置与它的关键码之间能够建立一一映射的关系，这样就可以不经过任何比较，通过函数关系一次直接从表中得到需要的元素。
这种方式称为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称为哈希表(Hash Table)(或者称散列表)。
如下面一组数据：

	int arr[]={1,5,6,3,8,9} ;

把它们存储到长度为10的顺序表中，哈希函数为：hash（key）=key%10；例如第一个值1,1%10=1，放到顺序表中下标为1的位置，同理其他数据的存储情况如下：
在这里插入图片描述
当查找数据时，只要将其进行哈希函数的处理，得到下标，进行存储即可。

1.2哈希冲突

哈希函数不一定能确保所有数据的哈希地址不同，如在上面的顺序表中存储11，通过上面的哈希函数得到的哈希地址与1是相同的，同一个地址不能存储两个数据，这种现象称为哈希冲突或哈希碰撞。把具有不同关键码而具有相同哈希地址的数据元素称为“同义词”。

1.3哈希函数

哈希函数决定了数据在哈希表中存储的位置，哈希冲突是不可避免的，所以可以选择合适的哈希函数来减少冲突，在设计哈希函数时，应尽可能的简单，其定义域必须包括需要存储的全部关键码，如果散列表允许有m个地址时，其值域必须在0到m-1之间，同时，尽可能使数据的关键码能均匀分布到定义域中。常用的哈希函数有直接定址法、除留余数法，前者使用线性函数作为函数，需要提前了解关键字的分布。后者可设散列表中允许的地址数为m，取一个不大于m，但最接近或者等于m的质数p作为除数，按照哈希函数：Hash(key) = key% p(p<=m),将关键码转换成哈希地址。

2、闭散列

闭散列是解决哈希冲突的方法之一，也叫开放定址法，如果插入数据的哈希地址上已有数据，且哈希表是未满的，这时可以将改数据存储到另外的空位置上。寻找下一个位置的方法有两种：线性探测和二次探测。

2.1线性探测

概念：从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止。
优点：实现非常简单，
缺点：一旦发生哈希冲突，所有的冲突连在一起，容易产生数据“堆积”，即：不同关键码占据了可利用的空位置，使得寻找某关键码的位置需要许多次比较，导致搜索效率降低。

2.2二次探测

线性探测的缺陷是产生冲突的数据堆积在一块，这与其找下一个空位置有关系，因为找空位置的方式就是挨着往后逐个去找，因此二次探测为了避免该问题，找下一个空位置的方法为：hash(key)=h0±i²，h0为冲突的地址，i的取值为自然数，初值为1，如果得到的新的地址依旧冲突，则i++，计算新的位置，如此循环。

2.3扩容

无论是哪种方法，随着闭散列中的有效数据数量或者说当负载因子（有效数据的数量/哈希表的容量）的增大，哈希冲突的可能性也就增大，此时需要对哈希表进行扩容。扩容时应该重新计算关键码，再依次插入数据。

3、闭散列线性探测模拟

源代码：

#pragma once
#include<iostream>
#include<algorithm>
#include <vector>
using namespace std;
namespace Closed_Hash
{
	enum State { EMPTY, EXIST, DELETE };//标记当前位置状态
	template<class K, class V>
	class HashTable
	{
		struct Element
		{
			pair<K, V> _val;
			State _state;
		};

	public:
		HashTable(size_t capacity = 3)
			:_ht(capacity),_size(0), _totalSize(0)
		{
			for (size_t i = 0; i < capacity; ++i)
				_ht[i]._state = EMPTY;
		}

		// 插入
		bool Insert(const pair<K, V>& val)
		{
			CheckCapacity();
			size_t i = 0;
			size_t start = HashFunc(val.first);
			size_t index = start + i;
			//查找空位
			while (_ht[index]._state == EXIST)
			{
				++i;
				index = (start + i) % _ht.size();
				//重复元素
				if(_ht[index]._state == EXIST && _ht[index]._val.first == val.first)
					return false;
			}
			_ht[index]._state = EXIST;
			_ht[index]._val = val;
			_size++;
			return true;
		}
		// 查找
		int Find(const K& key)
		{
			size_t index = HashFunc(key);
			while (_ht[index]._state != EMPTY)
			{
				if (_ht[index]._state == EXIST && _ht[index]._val.first == key)
					return index;
				index++;
			}
			return -1;
		}
		// 删除，伪删除，将删除数据的状态记为delete
		bool Erase(const K& key)
		{
			size_t index= Find(key);
			if (index == -1) return false;
			else
			{
				_ht[index]._state = DELETE;
				_size--;
				return true;
			}
		}
		size_t Size()const
		{
			return _size;
		}

		bool Empty() const
		{
			return _size == 0;
		}

		void Swap(HashTable<K, V>& ht)
		{
			swap(_size, ht._size);
			swap(_totalSize, ht._totalSize);
			_ht.swap(ht._ht);
		}
		
	private:
		size_t HashFunc(const K& key)
		{
			return key % _ht.capacity();
		}
		//容量检查
		void CheckCapacity()
		{
			if (_ht.size() == 0||_size * 10 / _ht.capacity() >= 7)
			{
				size_t newSize = _ht.size() == 0 ? 10 : _ht.size() * 2;
				HashTable<K, V> newHT;
				newHT._ht.resize(newSize);
				for (auto& e : _ht)
				{
					if (e._state == EXIST)
						newHT.Insert(e._val);
				}
				_ht.swap(newHT._ht);
			}
		}
	private:
		vector<Element> _ht;
		size_t _size;
	};
}

4、开散列

开散列法又叫链地址法(开链法)，首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，每个桶中都是发生冲突的数据，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

4.1扩容

桶的个数是一定的，随着元素的不断插入，每个桶中元素的个数不断增多，极端情况下，可能会导致一个桶中链表节点非常多，会影响的哈希表的性能，因此在元素个数刚好等于桶的个数时，可以给哈希表增容。

size_t BucketCount()const
{
	return _table.capacity();
}
void CheckCapacity()
{
	if (_size == BucketCount())
	{
		size_t newSize = GetNextPrime(_size);
		vector<Node*> newHT;
		newHT.resize(newSize, nullptr);
		for (size_t i = 0; i < _table.size(); ++i)
		{
			Node* cur = _table[i];
			while (cur)
			{
				Node* next = cur->_pNext;
				size_t index =cur->_data % newSize;
				cur->_pNext = newHT[index];
				newHT[index] = cur;
				cur = next;
			}
			_table[i] = nullptr;
		}
		newHT.swap(_table);
	}
}

5、开散列模拟实现

#pragma once
#include<vector>
#include<string>
#include<time.h>
using namespace std;
namespace Open_Hash
{
	template<class T>
	class HashFunc
	{
	public:
		size_t operator()(const T& val)
		{
			return val;
		}
	};
	template<>
	class HashFunc<string>
	{
	public:
		size_t operator()(const string& s)
		{
			const char* str = s.c_str();
			unsigned int seed = 131; // 31 131 1313 13131 131313
			unsigned int hash = 0;
			while (*str)
			{
				hash = hash * seed + (*str++);
			}
			return hash;
		}
	};
	template<class V>
	struct HashBucketNode
	{
		HashBucketNode(const V& data)
			: _pNext(nullptr), _data(data)
		{}
		HashBucketNode<V>* _pNext;
		V _data;
	};
	template<class V, class HF = HashFunc<V>>
	class HashBucket
	{
		typedef HashBucketNode<V> Node;
		typedef Node* PNode;
		typedef HashBucket<V, HF> Self;
	public:
		HashBucket(size_t capacity = 0) : _table(GetNextPrime(capacity)), _size(0) {}
		~HashBucket() { Clear(); }
		// 模拟key是唯一的，且只有单个元素类型，哈希桶中的元素不能重复
		Node* Insert(const V& data)
		{
			CheckCapacity();
			HF HashFunc;//仿函数，当数据为string等数据时，获取不同的关键码
			size_t index = HashFunc(data) % _table.size();
			Node* cur = _table[index];
			//检查插入的值是否是唯一的
			while (cur){
				if (cur->_data == data)
					return nullptr;
				else
					cur = cur->_pNext;
			}
			//链接到头上
			Node* newnode = new Node(data);
			newnode->_pNext = _table[index];
			_table[index] = newnode;
			++_size;
			return newnode;
		}
		// 删除哈希桶中为data的元素(data不会重复)
		bool Erase(const V& data)
		{
			if (_table.size() == 0)return false;
			HF HashFunc;
			size_t index = HashFunc(data) % _table.size();
			Node* cur = _table[index], * prev = nullptr;
			while (cur){
				prev = cur;
				if (cur->_data == data)
				{
					//cur是头结点
					if (cur == _table[index])
						_table[index] = cur->_pNext;
					else
						prev->_pNext = cur->_pNext;
					delete cur;
					--_size;
					return true;
				}
				else
					cur = cur->_pNext;
			}
			return false;
		}
		Node* Find(const V& data)
		{
			if (_table.size() == 0)return nullptr;
			HF HashFunc;
			size_t index = HashFunc(data) % _table.size();
			Node* cur = _table[index];
			while (cur){
				if (cur->_data == data)
					return cur;
				else
					cur = cur->_pNext;
			}
			return nullptr;
		}
		size_t Size()const{return _size;}
		bool Empty()const{return 0 == _size;}
		void Clear()
		{
			for (size_t i = 0; i < _table.size(); ++i)
			{
				Node* cur = _table[i];
				while (cur)
				{
					Node* next = cur->_pNext;
					delete cur;
					cur = next;
				}
			}
		}
		size_t BucketCount()const
		{
			return _table.capacity();
		}
		void Swap(Self& ht)
		{
			_table.swap(ht._table);
			swap(_size, ht._size);
		}
	private:
		size_t HashFunc(const V& data)
		{
			return HF()(data) % _table.capacity();
		}
		void CheckCapacity()
		{
			if (_size == BucketCount())
			{
				HF  HashFunc;
				size_t newSize = GetNextPrime(_size);
				vector<Node*> newHT;
				newHT.resize(newSize, nullptr);
				for (size_t i = 0; i < _table.size(); ++i)
				{
					Node* cur = _table[i];
					while (cur)
					{
						Node* next = cur->_pNext;
						size_t index = HashFunc(cur->_data) % newSize;
						cur->_pNext = newHT[index];
						newHT[index] = cur;
						cur = next;
					}
					_table[i] = nullptr;
				}
				newHT.swap(_table);
			}
		}
		//用素数作除留余数法的除数
		size_t GetNextPrime(size_t prime)
		{
			const int PRIMECOUNT = 28;
			static const size_t primeList[PRIMECOUNT] =
			{
				53ul, 97ul, 193ul, 389ul, 769ul,
				1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
				49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
				1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
				50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
				1610612741ul, 3221225473ul, 4294967291ul
			};
			size_t i = 0;
			for (; i < PRIMECOUNT; ++i)
			{
				if (primeList[i] > prime)
					return primeList[i];
			}
			return primeList[i];
		}
	private:
		vector<Node*> _table;
		size_t _size;      // 哈希表中有效元素的个数
	};
}

6、哈希桶迭代器

源代码：

template<class K, class T, class HF, class KeyOfT>
struct HTIterator
{
	typedef HashBucketNode<T> Node;
	typedef HashBucket<K, T, HF, KeyOfT> HB;
	typedef HTIterator<K, T, HF, KeyOfT> Self;
	Node* _node;
	HB* _ht;
	HTIterator(Node* node, HB* ht) :_node(node), _ht(ht) {}
	bool operator!=(const Self& s) const
	{
		return _node != s._node;
	}
	T& operator*()
	{
		return _node->_data;
	}
	T* operator->()
	{
		return &_node->_data;
	}
	Self operator++()
	{
		if (_node->_pNext)//当前位置的结点后面还有结点{
			_node = _node->_pNext;}
		else
		{
			KeyOfT kot;//当T为键值对时，获取key
			const K& key = kot(_node->_data);
			HF hf;
			size_t index = hf(key) % _ht->_table.size();
			++index;
			_node = nullptr;
			while (index < _ht->_table.size()){
				if (_ht->_table[index])
				{
					_node = _ht->_table[index];
					break;
				}
				else
					++index;
			}
			if (index == _ht->_table.size())
			{
				_node = nullptr;
			}
		}
		return *this;
	}
};

7、封装unordered_set

#pragma once
#include"Open Hash.h"
namespace Open_Hash
{
	template<class K, class HF = HashFunc<K>>
	class unordered_set
	{
		//获取key
		struct GetKeyOfT
		{
			const K& operator()(const K& key) const{return key;}
		};
	public:
		typedef typename HashBucket<K, K, HF, GetKeyOfT>::iterator iterator;
		iterator begin()
		{
			return _ht.begin();
		}
		iterator end()
		{
			return _ht.end();
		}
		bool insert(const K& key)
		{
			return _ht.Insert(key);
		}
		iterator find(const K& key)
		{
			return _ht.Find(key);
		}
	private:
		HashBucket<K, K, HF, GetKeyOfT> _ht;
	};
}

8、封装unordered_map

#pragma once
#include"Open Hash.h"
namespace Open_Hash
{
	template<class K, class V, class HF = HashFunc<K>>
	class unordered_map
	{
		//获取key
		struct GetKeyOfT
		{
			const K& operator()(const pair<const K, V>& kv) const{return kv.first;}
		};
	public:
		typedef typename HashBucket<K,pair<const K, V>, HF, GetKeyOfT>::iterator iterator;
		iterator begin()
		{
			return _ht.begin();
		}
		iterator end()
		{
			return _ht.end();
		}
		bool insert(const pair<const K, V>& kv)
		{
			return _ht.Insert(kv);
		}
		iterator find(const K& key)
		{
			return _ht.Find(key);
		}
	private:
		HashBucket<K, pair<const K, V>, HF, GetKeyOfT> _ht;
	};
}

樂丶x

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
哈希及其模拟

哈希及其模拟1、概念1.1哈希1.2哈希冲突1.3哈希函数2、闭散列2.1线性探测2.2二次探测2.3扩容3、闭散列线性探测模拟4、开散列4.1扩容5、开散列模拟实现6、哈希桶迭代器7、封装unordered_set8、封装unordered_map总结1、概念1.1哈希在线性表和平衡树中，元素的存储位置是随机的，即元素的存储位置与其关键码之间没有关系，因此在查找一个元素时，必须要经过关键码的多次比较。线性查找的时间复杂度为O(N)，平衡树中查找效率为树的高度，即O(log2（N))，搜索的效率取决于
复制链接

扫一扫