哈希（c++）

莫忘、莫念

已于 2023-11-22 10:55:23 修改

阅读量134

点赞数

分类专栏： c++ 面向对象编程c++ 文章标签： c++ 哈希算法开发语言

于 2023-11-22 10:55:01 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/m0_63783532/article/details/134495825

版权

面向对象编程c++ 同时被 2 个专栏收录

26 篇文章 13 订阅

订阅专栏

20 篇文章 0 订阅

订阅专栏

目录

一、unordered系列关联式容器

（一）unordered_set

（二）unordered_map

练习：961. 在长度2N的数组中找出重复N次的元素

二、哈希的底层结构

（一）哈希概念

（二）哈希冲突

三、哈希冲突解决（闭散列）

（一）线性探测

1. 仿函数（将key值转整型）

3. 查找和删除

4. 完整代码

（二）二次探测

四、哈希冲突解决（开散列）

（一）插入

（二）删除

（三）桶的数量、最大长度、平均长度

（四）完整代码

一、unordered系列关联式容器

在C++98中，STL提供了底层为红黑树结构的一系列关联式容器，在查询时效率可达到$log_2N$，即最差情况下需要比较红黑树的高度次，当树中的节点非常多时，查询效率也不理想。最好的查询是，进行很少的比较次数就能够将元素找到，因此在C++11中，STL又提供了4个unordered系列的关联式容器，这四个容器与红黑树结构的关联式容器使用方式基本类似，只是其底层结构不同
树状关联式容器（比如C++中的std::map和std::set）通常使用比较函数（仿函数）来确定元素的顺序，这样可以进行元素的大小比较。而哈希容器（比如C++中的std::unordered_map和std::unordered_set）需要将键转换为整数值，通常使用哈希函数（也是通过仿函数实现）来计算键的哈希值，然后将哈希值用于确定存储位置或进行快速查找

（一）unordered_set

#include <iostream>
#include <unordered_set>
using namespace std;
int main()
{
    unordered_set<int>s;
    s.insert(1);
    s.insert(3);
    s.insert(5);
    s.insert(1);
    s.insert(2);

    unordered_set<int>::iterator it = s.begin();
    while (it != s.end())
    {
        cout << *it << " ";
        ++it;
    }
    //1  3  5  2
    return 0;
}

（二）unordered_map

unordered_map是存储<key, value>键值对的关联式容器，其允许通过keys快速的索引到与其对应的value。
在unordered_map中，键值通常用于惟一地标识元素，而映射值是一个对象，其内容与此键关联。键和映射值的类型可能不同。
在内部，unordered_map没有对<kye, value>按照任何特定的顺序排序，为了能在常数范围内找到key所对应的value，unordered_map将相同哈希值的键值对放在相同的桶中

#include <iostream>
#include <unordered_map>
using namespace std;
int main()
{
    unordered_map<string,string>dict;
    dict["sort"] = "排序";
    dict["string"] = "字符串";


    unordered_map<string, string>::iterator it = dict.begin();
    while (it != dict.end())
    {
        cout << it->first << " "<<it->second<<endl;
        ++it;
    }
    //sort 排序
    //string 字符串
    return 0;
}

练习：961. 在长度2N的数组中找出重复N次的元素

力扣（LeetCode）官网 - 全球极客挚爱的技术成长平台

class Solution {
public:
    int repeatedNTimes(vector<int>& nums) {
        int n = nums.size();
        unordered_map<int, int> m;
        for(int i=0;i<nums.size();i++)
        {
            m[nums[i]]++;
        }

        unordered_map<int, int>::iterator it = m.begin();
        while(it!=m.end())
        {
            if(it->second == n/2)
            {
                return it->first;
            }
            ++it;
        }
        return 0;
    }
};

二、哈希的底层结构

（一）哈希概念

理想的搜索方法：可以不经过任何比较，一次直接从表中得到要搜索的元素。
如果构造一种存储结构，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

当向该结构中：

插入元素：根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放
搜索元素：对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

该方式即为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称
为哈希表(Hash Table)(或者称散列表)

（二）哈希冲突

不同关键字通过相同哈希哈数计算出相同的哈希地址，该种现象称为哈希冲突或哈希碰撞。
把具有不同关键码而具有相同哈希地址的数据元素称为“同义词”

三、哈希冲突解决（闭散列）

闭散列：也叫开放定址法，当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置中的“下一个” 空位置中去。

如何寻找下一个空位置？

（一）线性探测

hashi+i(i>=0) ，i可以是0,1,2,3

从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止

hashi++;
hashi = hashi % _tables.size();

如果发生越界，继续从0开始探测

1. 仿函数（将key值转整型）

模板特化中使用了一个简单的哈希算法，遍历字符串中的每个字符，将哈希值乘以一个常数（31），然后加上当前字符的 ASCII 值。这样的操作可以在一定程度上减少哈希冲突。这个特化的哈希函数适用于处理字符串类型的关键字

//伪函数，关键字转换成无符号整型
template<class K>
struct HashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

template<>//模板特化
struct HashFunc<string>//特化的哈希函数模板，专门用于处理 string 类型的关键字
{
	size_t hash = 0;
	size_t operator()(const string& key)
	{
		for (auto e : key)
		{
			hash *= 31;
			hash += e;
		}
		return hash;
	}
};

2. 插入

bool Insert(const pair<K, V>& kv)
{
	//负载因为0.7就扩容
	if (_n * 10 / _tables.size() == 7)
	{
		size_t newsize = _tables.size() * 2;
		HashTable<K, V> newHash;
		newHash._tables.resize(newsize);

		//遍历旧表依次将数据插入
		for (int i = 0; i < _tables.size(); i++)
		{
			if (_tables[i]._s == EXIST)
			{
				newHash.Insert(_tables[i]._kv);
			}
		}
		_tables.swap(newHash._tables);//交换旧哈希表和新哈希表

	}

	Hash hf;
	// 线性探测
	size_t hashi = hf(kv.first) % _tables.size();
	while (_tables[hashi]._s == EXIST)
	{
		hashi++;
		hashi = hashi % _tables.size();
	}
	_tables[hashi]._kv = kv;
	_tables[hashi]._s = EXIST;
	++_n;
	return true;
}

3. 查找和删除

HashData<K, V>* Find(const K& key)
{
	Hash hf;
	size_t hashi = hf(key) % _tables.size();
	while (_tables[hashi]._s != EMPTY)//容易出错
	{
		if (_tables[hashi]._s == EXIST && _tables[hashi]._kv.first == key)
		{
			return &_tables[hashi];
		}
		hashi++;
		hashi = hashi % _tables.size();//越界的时候，继续从头开始探测（查找）
	}
	return nullptr;
}

bool Erase(const K& key)//删除
{
	HashData<K, V>* ret = Find(key);
	if (ret != nullptr)
	{
		ret->_s = DELETE;//修改状态为删除
		--_n;//数据个数减少
		return true;
	}
	else
	{
		return false;
	}
}

4. 完整代码

HashTable.h

#pragma once
#include<vector>
#include<string>
namespace Imitate_HashTbale
{
	enum Status//哈希表每个空间状态标记
	{
		EMPTY,//空
		EXIST,//存在数据
		DELETE//已被删除
	};

	template<class K, class V>
	struct HashData
	{
		pair<K, V> _kv;
		Status _s;//标记哈希表每个位置的状态
	};

	//伪函数，关键字转换成无符号整型
	template<class K>
	struct HashFunc
	{
		size_t operator()(const K& key)
		{
			return (size_t)key;
		}
	};

	template<>//模板特化
	struct HashFunc<string>//特化的哈希函数模板，专门用于处理 string 类型的关键字
	{
		size_t hash = 0;
		size_t operator()(const string& key)
		{
			for (auto e : key)
			{
				hash *= 31;
				hash += e;
			}
			return hash;
		}
	};


	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
	public:
		HashTable()//构造函数
		{
			_tables.resize(10);
		}
		bool Insert(const pair<K, V>& kv)
		{
			//负载因为0.7就扩容
			if (_n * 10 / _tables.size() == 7)
			{
				size_t newsize = _tables.size() * 2;
				HashTable<K, V> newHash;
				newHash._tables.resize(newsize);

				//遍历旧表依次将数据插入
				for (int i = 0; i < _tables.size(); i++)
				{
					if (_tables[i]._s == EXIST)
					{
						newHash.Insert(_tables[i]._kv);
					}
				}
				_tables.swap(newHash._tables);//交换旧哈希表和新哈希表

			}

			Hash hf;
			// 线性探测
			size_t hashi = hf(kv.first) % _tables.size();
			while (_tables[hashi]._s == EXIST)
			{
				hashi++;
				hashi = hashi % _tables.size();
			}
			_tables[hashi]._kv = kv;
			_tables[hashi]._s = EXIST;
			++_n;
			return true;
		}
		HashData<K, V>* Find(const K& key)
		{
			Hash hf;
			size_t hashi = hf(key) % _tables.size();
			while (_tables[hashi]._s != EMPTY)//容易出错
			{
				if (_tables[hashi]._s == EXIST && _tables[hashi]._kv.first == key)
				{
					return &_tables[hashi];
				}
				hashi++;
				hashi = hashi % _tables.size();//越界的时候，继续从头开始探测（查找）
			}
			return nullptr;
		}

		bool Erase(const K& key)//删除
		{
			HashData<K, V>* ret = Find(key);
			if (ret != nullptr)
			{
				ret->_s = DELETE;//修改状态为删除
				--_n;//数据个数减少
				return true;
			}
			else
			{
				return false;
			}
		}
		void print()
		{
			for (int i = 0; i < _tables.size(); i++)
			{
				if (_tables[i]._s == EXIST)
				{
					cout << "[" << i << "]" << "->" << _tables[i]._kv.first << ":" << _tables[i]._kv.second << endl;
				}
			}
			cout << endl;
		}
	private:
		vector<HashData<K, V>> _tables;
		size_t _n;//存放关键字的个数
	};


	void test1()
	{
		HashTable<int, int> HT;
		int data[] = { 1,3,6,2,7,9 ,5,67,8,19 };
		for (auto ch : data)
		{
			HT.Insert(make_pair(ch, ch));
		}
		HT.print();


		if (HT.Find(99))
		{
			cout << "99存在" << endl;
		}
		else
		{
			cout << "99不存在" << endl;
		}


		if (HT.Erase(8))
		{
			cout << "成功删除8" << endl;
		}
		else
		{
			cout << "删除失败" << endl;
		}
	}

	void test2()
	{
		string arr[] = { "香蕉", "甜瓜","苹果", "西瓜", "苹果", "西瓜", "苹果", "苹果", "西瓜", "苹果", "香蕉", "苹果", "香蕉" };
		HashTable<string, int> ht;

		for (auto& e : arr)
		{
			HashData<string, int>* ret = ht.Find(e);
			if (ret != nullptr)//已经存在了
			{
				ret->_kv.second++;
			}
			else
			{
				ht.Insert(make_pair(e, 1));
			}
		}
		ht.print();
	}
}

test.c

#include<iostream>
using namespace std;
#include"HashTable.h"
int main()
{
	Imitate_HashTbale::test2();
	return 0;
}

（二）二次探测

hashi+i^2(i>=0)

线性探测的缺陷是产生冲突的数据堆积在一块，这与其找下一个空位置有关系，因为找空位置的方式就是挨着往后逐个去找，因此二次探测为了避免该问题

四、哈希冲突解决（开散列）

开散列法又叫链地址法(开链法)，首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

（一）插入

bool Insert(const pair<K, V>& kv)
{
	if (Find(kv.first))
		return false;

	Hash hf;

	// 负载因子最大到1
	if (_n == _tables.size())
	{
		vector<Node*> newTables;
		newTables.resize(_tables.size() * 2, nullptr);

		//遍历旧表
		for (size_t i = 0; i < _tables.size(); i++)
		{
			Node* cur = _tables[i];
			while (cur)//循环将某一个桶中的所有结点进行头插
			{
				Node* next = cur->_next;
				size_t hashi = hf(cur->_kv.first) % newTables.size();
				//采用头插
				cur->_next = newTables[hashi];
				newTables[hashi] = cur;
				cur = next;
			}
			_tables[i] = nullptr;//旧表置空
		}

		_tables.swap(newTables);
	}

	//插入结点
	size_t hashi = hf(kv.first) % _tables.size();
	Node* cur = new Node(kv);
	//采用头插
	cur->_next = _tables[hashi];
	_tables[hashi] = cur;
	++_n;
	return true;
}

（二）删除

bool Erase(const K& key)
{
	Hash hf;
	size_t hashi = hf(key) % _tables.size();
	Node* cur = _tables[hashi];
	Node* prev = nullptr;
	while (cur)
	{
		if (cur->_kv.first == key)
		{
			if (cur == _tables[hashi])//删除头结点
			{
				_tables[hashi] = cur->_next;
			}
			else
			{
				prev->_next = cur->_next;
			}
			//释放结点
			delete cur;
			cur = nullptr;
			--_n;
			return true;
		}
		else
		{
			prev = cur;
			cur = cur->_next;
		}
	}
	return false;
}

（三）桶的数量、最大长度、平均长度

void Some()
{
	size_t bucketSize = 0;//使用桶的个数
	size_t maxBucketLen = 0;//桶的最大长度
	double averageBucketLen = 0;//桶的平均长度
	size_t sum = 0;
	for (size_t i = 0; i < _tables.size(); i++)
	{
		Node* cur = _tables[i];
		if (cur != nullptr)
		{
			++bucketSize;
		}

		size_t bucketLen = 0;//记录一个桶的长度
		while (cur)
		{
			++bucketLen;
			cur = cur->_next;
		}
		sum += bucketLen;

		//记录最大的桶的长度
		if (bucketLen > maxBucketLen)
		{
			maxBucketLen = bucketLen;
		}
	}
	averageBucketLen = double(sum) / bucketSize;//桶的平均长度
	printf("all bucketSize:%d\n", _tables.size());
	printf("bucketSize:%d\n", bucketSize);
	printf("maxBucketLen:%d\n", maxBucketLen);
	printf("averageBucketLen:%lf\n\n", averageBucketLen);
}

（四）完整代码

#pragma once
#include<vector>
#include<string>
#include<cstdbool>
//namespace Imitate_hash_bucket
//{
	template<class K, class V>
	struct HashNode//结点定义
	{
		HashNode* _next;
		pair<K, V>_kv;

		HashNode(const pair<K, V>& kv)
			:_kv(kv)
			, _next(nullptr)
		{
			;
		}
	};


	//伪函数，关键字转换成无符号整型
	template<class K>
	struct HashFunc
	{
		size_t operator()(const K& key)
		{
			return (size_t)key;
		}
	};

	template<>//模板特化
	struct HashFunc<string>//特化的哈希函数模板，专门用于处理 string 类型的关键字
	{
		size_t hash = 0;
		size_t operator()(const string& key)
		{
			for (auto e : key)
			{
				hash *= 31;
				hash += e;
			}
			return hash;
		}
	};


	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
		typedef HashNode<K, V> Node;
	public:
		HashTable()
		{
			_tables.resize(10);
		}
		~HashTable()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				while (cur)
				{
					Node* next = cur->_next;//cur的后驱结点
					delete cur;
					cur = next;
				}
				_tables[i] = nullptr;
			}
		}

		bool Insert(const pair<K, V>& kv)
		{
			if (Find(kv.first))
				return false;

			Hash hf;

			// 负载因子最大到1
			if (_n == _tables.size())
			{
				vector<Node*> newTables;
				newTables.resize(_tables.size() * 2, nullptr);

				//遍历旧表
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					while (cur)//循环将某一个桶中的所有结点进行头插
					{
						Node* next = cur->_next;
						size_t hashi = hf(cur->_kv.first) % newTables.size();
						//采用头插
						cur->_next = newTables[hashi];
						newTables[hashi] = cur;
						cur = next;
					}
					_tables[i] = nullptr;//旧表置空
				}

				_tables.swap(newTables);
			}

			//插入结点
			size_t hashi = hf(kv.first) % _tables.size();
			Node* cur = new Node(kv);
			//采用头插
			cur->_next = _tables[hashi];
			_tables[hashi] = cur;
			++_n;
			return true;
		}

		Node* Find(const K& key)
		{
			Hash hf;
			size_t hashi = hf(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					return cur;
				}
				else
				{
					cur = cur->_next;
				}
			}
			return nullptr;
		}

		bool Erase(const K& key)
		{
			Hash hf;
			size_t hashi = hf(key) % _tables.size();
			Node* cur = _tables[hashi];
			Node* prev = nullptr;
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					if (cur == _tables[hashi])//删除头结点
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						prev->_next = cur->_next;
					}
					//释放结点
					delete cur;
					cur = nullptr;
					--_n;
					return true;
				}
				else
				{
					prev = cur;
					cur = cur->_next;
				}
			}
			return false;
		}

		void Some()
		{
			size_t bucketSize = 0;//使用桶的个数
			size_t maxBucketLen = 0;//桶的最大长度
			double averageBucketLen = 0;//桶的平均长度
			size_t sum = 0;
			for (size_t i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				if (cur != nullptr)
				{
					++bucketSize;
				}

				size_t bucketLen = 0;//记录一个桶的长度
				while (cur)
				{
					++bucketLen;
					cur = cur->_next;
				}
				sum += bucketLen;

				//记录最大的桶的长度
				if (bucketLen > maxBucketLen)
				{
					maxBucketLen = bucketLen;
				}
			}
			averageBucketLen = double(sum) / bucketSize;//桶的平均长度
			printf("all bucketSize:%d\n", _tables.size());
			printf("bucketSize:%d\n", bucketSize);
			printf("maxBucketLen:%d\n", maxBucketLen);
			printf("averageBucketLen:%lf\n\n", averageBucketLen);
		}

	private:
		vector<Node*>_tables;
		size_t _n;
	};

	void test1()
	{
		HashTable<int, int>ht;
		int data[] = { 1,4,2,7,8,12,6 };
		for (auto e : data)
		{
			ht.Insert(make_pair(e, e));
		}



		if (ht.Find(12))
		{
			cout << "12找到了" << endl;
		}
		else
		{
			cout << "未找到" << endl;
		}

		//ht.Erase(4);
		//ht.Erase(2);
	}
	void test2()
	{
		const int N = 100;
		vector<int>v;
		HashTable<int, int> ht;
		srand(time(0));
		for (size_t i = 0; i < N; i++)
		{
			v.push_back(rand() + i); // 重复值相对少
		}

		for (auto e : v)
		{
			ht.Insert(make_pair(e, e));
		}

		ht.Some();
	}

	void test3()
	{
		string arr[] = { "香蕉", "甜瓜","苹果", "西瓜", "苹果", "西瓜", "苹果", "苹果", "西瓜", "苹果", "香蕉", "苹果", "香蕉" };
		HashTable<string, int> ht;
		for (auto& e : arr)
		{
			HashNode<string, int>* ret = ht.Find(e);
			if (ret)
			{
				ret->_kv.second++;
			}
			else
			{
				ht.Insert(make_pair(e, 1));
			}
		}
	}

//}

#include<iostream>
using namespace std;
#include"HashTable.h"
int main()
{
	//Imitate_hash_bucket::test3();
	test3();
	return 0;
}

莫忘、莫念

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
哈希（c++）

遍历字符串中的每个字符，将哈希值乘以一个常数（31），然后加上当前字符的 ASCII 值。理想的搜索方法：可以不经过任何比较，一次直接从表中得到要搜索的元素。如果构造一种存储结构，通过某种函数(hashFunc)使元素的。hashi+i(i>=0) ，i可以是0,1,2,3。这个特化的哈希函数适用于处理字符串类型的关键字。为哈希表(Hash Table)(或者称散列表)，那么在查找时通过该函数可以很快找到该元素。该方式即为哈希(散列)方法，哈希方法中使用的。模板特化中使用了一个简单的。
复制链接

扫一扫

专栏目录

莫忘、莫念 CSDN认证博客专家 CSDN认证企业博客

码龄3年

242: 原创

3万+: 周排名

2万+: 总排名

22万+: 访问

: 等级

3094: 积分

1646: 粉丝

384: 获赞

43: 评论

1566: 收藏

私信

关注

热门文章

分类专栏

最新评论

递归和分治（基础）
啥也不会干的小码: 佬，二分查找那里写漏了函数调用
SQL sever基础使用
xsimah: 博主总结的很到位，我也总结了下：https://kngines.blog.csdn.net/article/details/139014717
主存——Cache地址映射
xiaoluxiaoluC: 我想问一下例题中求 cache 地址位数为什么不用总容量 512 除以字长 32 得到存储单元个数
移位运算符（详细介绍）
mybro12: 楼主，，，那 &0xff 这个能详细解释下吗。。。一直搞不懂。
Linux调试器-gdb使用
originalHSL.: 写的很详细！学到了学到了

您愿意向朋友推荐“博客详情页”吗？

强烈不推荐
不推荐
一般般
推荐
强烈推荐

提交

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。