数据结构--散列表（哈希表）

最新推荐文章于 2024-05-05 16:13:36 发布

代码噜噜噜

最新推荐文章于 2024-05-05 16:13:36 发布

阅读量420

点赞数

分类专栏：数据结构文章标签：列表数据结构

本文链接：https://blog.csdn.net/qq_20225851/article/details/104223919

版权

数据结构专栏收录该内容

13 篇文章 2 订阅

订阅专栏

散列表

数据对象集：：符号表是“名字(Name)-属性(Attribute)”对的集合。
操作集：

SymbolTable InitializeTable( int TableSize )：创建一个长度为TableSize的符号表
Boolean IsIn( SymbolTable Table, NameType Name)：查找特定的名字Name是否在符号表Table中
AttributeType Find( SymbolTable Table, NameType Name)：获取Table中指定名字Name对应的属性；
SymbolTable Modefy(SymbolTable Table, NameType Name, AttributeType Attr)：将Table中指定名字Name的属性修改为Attr；
SymbolTable Insert(SymbolTable Table, NameType Name, AttributeType Attr)：向Table中插入一个新名字Name及其属性Attr
SymbolTable Delete(SymbolTable Table, NameType Name)：从Table中删除一个名字Name及其属性。

以关键字key作为自变量，通过定义一个确定的函数h（散列函数），计算出对应的函数值h（key）。作为数据对象的存储地址。可能不同的关键字会再在同一个散列地址上。即“冲突”。需要种种解决冲突的决策。

举个栗子：定义一个数组a[100] ，可我向存储5位数的数据，实现快速查找和插入。例如存放12345

定义一个散列函数h （key） = key %100
将key= 12345 带入函数h得到函数值45
则将a[45]存入12345
查找12345是否存在，是只需调用h函数h（12345）=45 ，然后访问a[45]就可以知道是否有12345存在，就达到了O（1）时间查找的目的。
冲突：比如还想存一个数22245 ，h（22245）=45，可是45已经存了12345了，所以发生了冲突，就需要一些解决冲突的办法。

所以散列查找法的两项基本工作:

1.计算位置：构造散列函数确定关键词存储位置。
2.解决冲突:应用某种策略解决多个关键词位置相同的问题。

散列表查询、插入、删除、的效率都是O（1）

散列函数的构造方法

好的散列函数具备一笑两个因素：

计算简单，以便提高转换速度
关键词对应的地址空间分布均匀，以尽量减少冲突

后面我们会看到，虽然有解决冲突的方法，但是冲突越多会导致效率越低。所以一个好的散列函数至关重要。

数字
1.直接定址法
在这里插入图片描述
key（1991） = 1991-1990 = 1 所以就将1991存入地址1中。

2.除留余数法：
在这里插入图片描述
这个方法比较常用，p取的是散列表的大小，一般取素数，后面解决冲突的时候会解释，最开始举得例子用的就是这个方法。

3.数字分析法：
在这里插入图片描述
分析数字的特点，制定相应的散列函数。

4.折叠法:
在这里插入图片描述

5.平方取中法：
在这里插入图片描述
就是尽量然更多的信息去生成对应的key，这样冲突就会越少。

字符
1.ASCLL码加和法
在这里插入图片描述
这个方法冲突严重：a3 ， b2 ,c1 ,eat ,tea;

2.前三个字符移位法
在这里插入图片描述
任然会有冲突：string , stree ,strong ；

3.移位法：
在这里插入图片描述
将n个字符都进行计算，这个方法是比较好的。

移位法快速计算方法：
在这里插入图片描述
可以转化成:
h（“abcde”） =( (（a*32+b）*32+c)*32+d)+e

Index Hash(const char *key , int tablesize)
{
	int h = 0 ;
	while (*key !='\0') h=(h<<5) + *key++; //a<<5 == a*32
	return h % tablesize ;
}

散列表冲突处理方法

1.线性探测法
遇到冲突后，以增量序列：1，2，3…,Tablesize-1 循环试探下一个存储地址。

例：
在这里插入图片描述
可以看到 7 和29 和84 是冲突的，以及 9 和20是冲突的。

插入47 和 7 没有冲突，当插入29时，对应的散列地址为7，此时冲突了，位置移动1位，地址8时空的，存入。

存入11 9 无冲突

存入84时发生冲突，84对应的地址是7 ,此时从增量位1开始，地址8有了，增量为2，地址9也有了，增量为3 ，地址10为空，存入。
在这里插入图片描述
接下去的元素按照这个方法存入。

散列查找性能分析

成功平均查找长度（ASLs）即查找表中存在的元素的平均查找次数
不成功平均查找长度（ASLu）即判断出元素不再表中需要的平均查找次数

在这里插入图片描述

来看个字符的例子：
在这里插入图片描述

通过上面两个例子可以看到，线性探测法容易造成堆积的现象，导致冲突次数增多，影响查找效率。所以有了升级版，平方探测法。

平方探测法

遇到冲突后，以增量序列：1，-1，4，-4，9，-9…,Tablesize-1 循环试探下一个存储地址。探测方法和线性是一样的，只是变成了平方，一次正一次负。

例：
在这里插入图片描述

还是刚刚的例子，但是改成平方探测法后ASLs 变成了2 ，平均查找次数变少了，效率变高了。

**问题：**线性探测法，只要表不满，总能找到位置插入，平方探测法是跳着查的，当跳到的位置上都存在数据，然而表并没有满，此时就会导致没地方插入。
在这里插入图片描述

**解决方法：**散列表长度TableSize是某个4k+3（k是正整数）形式的素数时，平方探测法就可以探查到整个散列表空间。也就解释了为什么开头提到的mod p 中的p要素数的原因。

平方探测法的实现

1.结构定义
每个结点需要有一个info来判断是否有元素存在，删除元素也不用真正的把结点删掉，把info设为false即代表空。

typedef enum {full , empty , Del} entrytype ; // 满 ， 空 ，删除
typedef struct cell{
	int date ;  //存放数据 
	entrytype  info;  //判定是否为空 
}Cell ,*qcell;

typedef  struct HashTbl{  //定义哈希表 
	int tableSize;  // 哈希表大小 
	qcell Thecells ; //  
}*HashTable;

2.哈希表初始化

int nextprime (int tablesize)
{
	int i,j;
	for (i=tablesize ;;i++)
	{
		int item = i ;
		for ( j = 2;j<=sqrt(item);j++)
			{
				if (item %j==0)
				break ;
			}
		if (j>sqrt(item)) //找到素数 
			return i ;
	}	
} 

HashTable Hashcreat(int tablesize)
{
	HashTable H ;
	tablesize = nextprime(tablesize); // 更改为下一个素数 
	H = (HashTable)malloc(sizeof (struct HashTbl)); 
	H->tableSize = tablesize ;
	H->Thecells = (qcell) malloc (sizeof (Cell) * tablesize);
	for (int i=0;i<tablesize;i++)
	H->Thecells[i].info = empty ;// 全部初始化为空 
	return H ; 
 }

3.查找

先获取对应散列函数值
如果该位置的元素不为空，但是不是要寻找的值，则用平方探测的值接下去找
一直重复2步骤，直到找到date 或者移动到的位置为空，则说明没有date，也退出
返回位置
返回后通过判断该位置是否为空，空则没有date ，非空则最应的位置就是date

 int Find (HashTable H , int date) //返回所在下标位置  如果是不存在则 所对应结点的info 为false  
 {
 	int cNum =0 ;//冲突次数
	int newpos , currentpos ;
	newpos = currentpos = Hash(H,date ); // 获取初始位置
	
	while (H->Thecells[newpos].info != empty && H->Thecells[newpos].date!= date  ) 
	{
		if (++cNum%2) // 基数次冲突 
		{
			newpos = currentpos +(cNum+1)*(cNum+1)/4;
			if (newpos >=H->tableSize) // 超出范围
				newpos %= H->tableSize ;
		}
		else 
		{
			newpos = currentpos- (cNum*cNum) / 4 ;
			while  (newpos <0)
			newpos += H->tableSize ;
		}
	}
 	return newpos ;
 }

4.插入
有了查找，插入就容易了，先通过查找把该插入的位置返回回来，如果非空则说明该元素已经存在，空则直接插入即可。

bool insert(HashTable H , int date )
{
	int pos = Find (H,date);
	
	if (H->Thecells[pos].info != full   ) // 说明不存在元素 则可以插入
	{
		H->Thecells[pos].info = full ; 
		H->Thecells[pos].date = date ;
		return true ;
	} 
	else 
	{
		cout <<"键值已经存在!"<<endl ;
		return false ;
	}
 }

5.删除

void  Delete (HashTable H , int date)
 {
 	int pos = Find (H,date);
 	if (H->Thecells[pos].info != full)
 	{
 		cout <<"元素不存在，无需删除！"<<endl ; 
	 }
	 else 
	 {
	 	H->Thecells[pos].info = Del ;//设为空  
	 	cout <<"删除："<<date ; 
	 }
 }

测试代码：

#include <iostream>
#include <cmath>
#include <stdlib.h>
#include <string>
using namespace std ;

typedef enum {full , empty , Del} entrytype ;
typedef struct cell{
	int date ;  //存放数据 
	entrytype  info;  //判定是否为空 
}Cell ,*qcell;

typedef  struct HashTbl{  //定义哈希表 
	int tableSize;  // 哈希表大小 
	qcell Thecells ; //  
}*HashTable;

int nextprime (int tablesize)
{
	int i,j;
	for (i=tablesize ;;i++)
	{
		int item = i ;
		for ( j = 2;j<=sqrt(item);j++)
			{
				if (item %j==0)
				break ;
			}
		if (j>sqrt(item)) //找到素数 
			return i ;
	}	
} 

HashTable Hashcreat(int tablesize)
{
	HashTable H ;
	tablesize = nextprime(tablesize); // 更改为下一个素数 
	H = (HashTable)malloc(sizeof (struct HashTbl)); 
	H->tableSize = tablesize ;
	H->Thecells = (qcell) malloc (sizeof (Cell) * tablesize);
	for (int i=0;i<tablesize;i++)
	H->Thecells[i].info = empty ;// 全部初始化为空 
	return H ; 
 } 
 
 int Hash(HashTable H , int date) 
 {
 	return date % H->tableSize ;
 }
 
 int Find (HashTable H , int date) //返回所在下标位置  如果是不存在则 所对应结点的info 为false  
 {
 	int cNum =0 ;//冲突次数
	int newpos , currentpos ;
	newpos = currentpos = Hash(H,date ); // 获取初始位置
	
	while (H->Thecells[newpos].info != empty && H->Thecells[newpos].date!= date  ) 
	{
		if (++cNum%2) // 基数次冲突 
		{
			newpos = currentpos +(cNum+1)*(cNum+1)/4;
			if (newpos >=H->tableSize) // 超出范围
				newpos %= H->tableSize ;
		}
		else 
		{
			newpos = currentpos- (cNum*cNum) / 4 ;
			while  (newpos <0)
			newpos += H->tableSize ;
		}
	}
 	return newpos ;
 }


bool insert(HashTable H , int date )
{
	int pos = Find (H,date);
	
	if (H->Thecells[pos].info != full   ) // 说明不存在元素 则可以插入
	{
		H->Thecells[pos].info = full ; 
		H->Thecells[pos].date = date ;
		return true ;
	} 
	else 
	{
		cout <<"键值已经存在!"<<endl ;
		return false ;
	}
 } 
 
 void  Delete (HashTable H , int date)
 {
 	int pos = Find (H,date);
 	if (H->Thecells[pos].info != full)
 	{
 		cout <<"元素不存在，无需删除！"<<endl ; 
	 }
	 else 
	 {
	 	H->Thecells[pos].info = Del ;//设为空  
	 	cout <<"删除："<<date ; 
	 }
 }
int main ()
{
	int size , n ;
	cout <<"输入散列表大小:";
	cin >> size ;
	
	HashTable H = Hashcreat(size);
	
	cout <<"输入插入的元素个数：";
	cin >> n ;
	
	for (int i=0 ;i < n;i++)
	{
		int date ;
		cin >> date ;
		insert (H,date);
	 }  
	 cout << "请输入要查找的元素："<<endl ; 
	int date ;
	Delete (H,29);
	while (cin >> date)
	{
		int pos = Find (H,date);
		if (H->Thecells[pos].info == full )
			cout <<" Yes" <<endl;
		else 
		cout << "No"<<endl ; 
	} 
	return 0; 
} 
/*
11
9
47 7 29 11 9 84 54 20 30
*/

在这里插入图片描述

分离链接法

分离链接法：将相应位置上冲突的所有关键词存储在同一个单链表中
个人比较喜欢这种方法，因为比较简单，如果遇到冲突，不需要什么解决冲突的方法，直接接到之前元素的后面就可以了，也比较容易实现。

这个自己写了，不学习mooc给的了，那个太复杂了，比赛的时候根本记不住。

#include <iostream>
#include <set> 
#define Maxsize 11
using namespace std ;
set<int> a[Maxsize];

void insert( int date )
{
	int pos = date % Maxsize;
	a[pos].insert(date);
}

bool  Find(int date)
{
	int pos = date%Maxsize ;
	if (!a[pos].empty() && a[pos].find(date)!=a[pos].end())
		return true ;
	else 
		return false ;
}

void Delete (int date)
{
	int pos = date % Maxsize ;
	if (!a[pos].empty()&&a[pos].find(date)!=a[pos].end())
		a[pos].erase  (date);
	cout <<" 删除：" <<date<<endl; 
}

int main ()
{
	int n;
	cout <<"输入插入的元素个数：";
	cin >> n ;
	for (int i=0 ;i < n;i++)
	{
		int date ;
		cin >> date ;
		insert (date);
	 }  
	cout << "请输入要查找的元素："<<endl ; 
	int date ;
	Delete (29);
	while (cin >> date)
	{
		if (Find (date))
			cout <<" Yes" <<endl;
		else 
		cout << "No"<<endl ; 
	} 
	return 0; 
	
}
/*
9
47 7 29 11 9 84 54 20 30
*/

在这里插入图片描述

哈希表在程序设计中，当数据范围特别大时可以起到很重要的作用。

习题：电话狂人

给定大量手机用户通话记录，找出其中通话次数最多的聊天狂人。

输入格式:
输入首先给出正整数N（≤10
5
），为通话记录条数。随后N行，每行给出一条通话记录。简单起见，这里只列出拨出方和接收方的11位数字构成的手机号码，其中以空格分隔。

输出格式:
在一行中给出聊天狂人的手机号码及其通话次数，其间以空格分隔。如果这样的人不唯一，则输出狂人中最小的号码及其通话次数，并且附加给出并列狂人的人数。

输入样例:
4
13005711862 13588625832
13505711862 13088625832
13588625832 18087925832
15005713862 13588625832

输出样例:
13588625832 3

这题用个映射map<streing , int > 就可以解决问题啦，因为电话号码位数相同，所以直接string进行比较大小就可以了，不用转化成数字。

#include <iostream>
#include <map>
#include <string>

using namespace std ;

int main ()
{
	map<string ,int> m ;
	
	int n;
	cin >> n;
	for (int i=0;i<n;i++)
	{
		string number1 ,number2 ;
		cin >> number1 >> number2 ;
		if (m.find(number1) !=m.end())
			{
				m[number1] += 1 ;
			}
		else 
			m[number1] = 1 ;
			
			if (m.find(number2) !=m.end())
			{
				m[number2] += 1 ;
			}
		else 
			m[number2] = 1 ;
	}
	string number;
	int thesame = 0 ;
	n = -1;
	for (map<string ,int>::iterator it = m.begin(); it!=m.end();it++)
	{
		if (it->second  > n)
		{
			number = it->first ;
			n = it->second ;
			thesame = 1 ;
		}
		else if (it->second  == n)
		{
			if (number > it->first)
				number = it->first ;
			thesame ++ ;
		}
	}
	if (thesame >1)
	{
		cout << number <<" "<<n <<" "<<thesame ;
	}
	else 
	cout << number <<" "<< n ;
	
}

习题：Hash

7-16 Hashing (25分)
The task of this problem is simple: insert a sequence of distinct positive integers into a hash table, and output the positions of the input numbers. The hash function is defined to be H(key)=key%TSize where TSize is the maximum size of the hash table. Quadratic probing (with positive increments only) is used to solve the collisions.

Note that the table size is better to be prime. If the maximum size given by the user is not prime, you must re-define the table size to be the smallest prime number which is larger than the size given by the user.

Input Specification:
Each input file contains one test case. For each case, the first line contains two positive numbers: MSize (≤10
4
) and N (≤MSize) which are the user-defined table size and the number of input numbers, respectively. Then N distinct positive integers are given in the next line. All the numbers in a line are separated by a space.

Output Specification:
For each test case, print the corresponding positions (index starts from 0) of the input numbers in one line. All the numbers in a line are separated by a space, and there must be no extra space at the end of the line. In case it is impossible to insert the number, print “-” instead.

Sample Input:
4 4
10 6 4 15

Sample Output:
0 1 4 -

#include <iostream>
#include <cmath>
#include <string.h>
using namespace std ;

int maxsize ;

int nextprime (int n)
{
	if (n<= 2 )
	return 2 ;
	n = (n%2)? n : n+1 ;
	for (int i=n;;i+=2)
	{
		int j ;
		for ( j = 2 ; j<=sqrt(i);j++)
		{
			if (i%j==0)
				break ;
		}
		if (j >sqrt(i)) return i ;
	}
}

int insert ( int H[],int date )
{
	int pos , pos1;
	pos1 = pos  = date%maxsize  ;
	int flag = 0 ;
	while (flag < 10000 &&H[pos] !=-1)
	{
		flag ++ ;
		pos = (pos1 + flag * flag)%maxsize ; 
	}
	if (flag >=10000)
		return -1 ;
	else 
		{
			H[pos] = date ;
			return pos ;
		}
}

int main ()  
{
	int n ,m ;
	cin >> n >> m ;
	maxsize = nextprime(n);
	int a[maxsize] ;
	memset(a, -1 ,sizeof (a));
	int item ;
	for (int i=0 ;i<m;i++)
	{
		cin >> item ;
		int b = insert(a,item);
		if (b == -1)
		cout <<"-";
		else 
		cout <<b ;
		if (i!=m-1)
			cout <<" " ;
	}
	
	
}

代码噜噜噜

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
数据结构--散列表（哈希表）

散列表数据对象集：：符号表是“名字(Name)-属性(Attribute)”对的集合。操作集：SymbolTable InitializeTable( int TableSize )：创建一个长度为TableSize的符号表Boolean IsIn( SymbolTable Table, NameType Name)：查找特定的名字Name是否在符号表Table中AttributeTy...
复制链接

扫一扫