week14 字符串哈希与KMP

最新推荐文章于 2024-07-19 17:54:56 发布

TangerineICE

最新推荐文章于 2024-07-19 17:54:56 发布

阅读量176

点赞数

文章标签：字符串哈希表 hash 正则表达式

本文链接：https://blog.csdn.net/YangYANGlate/article/details/106647813

版权

问题描述
ZJM 为了准备霍格沃兹的期末考试，决心背魔咒词典，一举拿下咒语翻译题
题库格式：[魔咒] 对应功能
背完题库后，ZJM 开始刷题，现共有 N 道题，每道题给出一个字符串，可能是 [魔咒]，也可能是对应功能
ZJM 需要识别这个题目给出的是 [魔咒] 还是对应功能，并写出转换的结果，如果在魔咒词典里找不到，输出 “what?”

Input
首先列出魔咒词典中不超过100000条不同的咒语，每条格式为：

[魔咒] 对应功能

其中“魔咒”和“对应功能”分别为长度不超过20和80的字符串，字符串中保证不包含字符“[”和“]”，且“]”和后面的字符串之间有且仅有一个空格。魔咒词典最后一行以“@END@”结束，这一行不属于词典中的词条。
词典之后的一行包含正整数N（<=1000），随后是N个测试用例。每个测试用例占一行，或者给出“[魔咒]”，或者给出“对应功能”。

Output
每个测试用例的输出占一行，输出魔咒对应的功能，或者功能对应的魔咒。如果在词典中查不到，就输出“what?”

解决思路：
字符串哈希：
将字符串转换成整形数字，存储在哈希表中
原理如下：随机数种子常常采用17和131

map<unsigned,int> mp;
unsigned hashstring(string _s){//BKDR哈希
    unsigned int seed=17,_hash=0;
    for (int i=0; i<_s.size(); i++)
        _hash=_hash*seed+(unsigned)(_s[i]);
    return _hash;
}

将字符串的哈希值与其存储的位置联系起来，如果存在这个哈希值，就可以找到字符串对应的数组的位置，这样就能知道是否存在映射。

#include<iostream>
#include<string.h>
#include<string>
#include<map>
using namespace std;
const int maxn = 1000000+10;
char s1[maxn][25],s2[maxn][85];
map<unsigned,int> mp;
int cnt;
unsigned hashstring(string s)
{
	unsigned int seed = 17,hash =0;
	for(int i = 1;i<s.size()-1;i++)
	{
		//cout<<s[i];
		hash += seed*hash +(unsigned)(s[i]);
	}
	//cout<<"--"<<hash<<endl;
	return hash;
}

void get()
{
	string a,b;
	while(cin>>a&&a!="@END@")
	{
		cnt++;
		getline(cin,b);
		//cout<<a<<endl<<b<<endl;
		mp[hashstring(a)] = cnt;
		mp[hashstring(b+"#")] = cnt;
		strcpy(s1[cnt],a.c_str());
		strcpy(s2[cnt],b.c_str());
		//cout<<"------------------------------"<<endl;	
	}
}
void find(string s)
{
	if(s[0]=='[')
	{
		int k = mp[hashstring(s)];
		//cout<<k<<endl;
		if(k!=0)
		{
			for(int i = 1;s2[k][i]!='\0';i++)
			{
				cout<<s2[k][i];
			}
			cout<<endl;
		}
		else{
			cout<<"what?"<<endl;
		}
		//cout<<"---------------------"<<endl;
	}
	else{
		int k = mp[hashstring(" "+s+"#")];
		//cout<<k<<endl;
		if(k!=0)
		{
			for(int i = 1;s1[k][i]!=']';i++)
			{
				cout<<s1[k][i];
			}
			cout<<endl;
		}
		else{
			cout<<"what?"<<endl;
		}
		//cout<<"---------------------"<<endl;
	}
}

int main()
{
	get();
	int a;
	cin>>a;
	getchar();
	while(a--)
	{
		string s;
		getline(cin,s);
		find(s);
	}
}

问题描述
ZJM 的女朋友是一个书法家，喜欢写一些好看的英文书法。有一天 ZJM 拿到了她写的纸条，纸条上的字暗示了 ZJM 的女朋友想给 ZJM 送生日礼物。ZJM 想知道自己收到的礼物是不是就是她送的，于是想看看自己收到的礼物在纸条中出现了多少次。

Input
第一行输入一个整数代表数据的组数

每组数据第一行一个字符串 P 代表 ZJM 想要的礼物, 包含英语字符 {‘A’, ‘B’, ‘C’, …, ‘Z’}, 并且字符串长度满足 1 ≤ |P| ≤ 10,000 (|P| 代表字符串 P 的长度).

接下来一行一个字符串 S 代表 ZJM 女朋友的纸条, 也包含英语字符 {‘A’, ‘B’, ‘C’, …, ‘Z’}, 满足 |P| ≤ |S| ≤ 1,000,000.

Output
输出一行一个整数代表 P 在 S中出现的次数.

Sample input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
1
2
3
4
5
6
7

Sample output

1
3
0

思路
在英文字符串匹配的算法中，KP算法的时间复杂度可以达到O（m+n）,具体是通过字符串跳跃的来比较，忽略过程中一定不可能的情况，这个算法需要的是我们构建出首尾相同的字符的个数的数组，这个方法类似于DP，假设next[i]表示的是前i个字符，首尾相同的个数，那么next[i]就与inext[-1]有关，假设next[i-1]的首尾相同的个数为k个，那么如果字符A[k+1]和A[i]相同，next[i]就等于next[i-1]+1;反之，就可以限制搜索前缀的范围到1~k，即查询next[k],继续循环

a b c d a b d
0 0 0 0 1 2
在查询d的时候，我们看到搜素A[]next[i-1]]的结果是c,是不匹配的，所以范围就缩小到了1~2，然后搜索到0，停止，证明前面没有匹配的字符，结束将d位置置0
next数组的算法：

void get_next(const char ptr[],int len){
    nxt[1]=0;
    for (int i=2,j=0; i<len; i++){
        while(j && ptr[j+1]!=ptr[i]) j=nxt[j-1];
        if(ptr[j+1]==ptr[i]) j++;
        nxt[i]=j+1;
    }
}

匹配的代码则是
a b c d a b d
a b c d a b c
假如c d不匹配，next[d]为前面相同的个数，c这里为3，所以从4开始匹配

//#pragma GCC optimize(2)
//#pragma G++ optimize(2)
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include <cstring>
#include <string>
#include <climits>
#include <algorithm>
#include <queue>
#include <vector>
using namespace std;

const int maxn=10000+10;
const int maxm=1000000+10;
int nxt[maxn],ans,n;
char ptr[maxn],str[maxm];
void get_next(const char ptr[],int len){
    nxt[0]=0;
    for (int i=1,j=0; i<len; i++){
        while(j && ptr[j]!=ptr[i]) j=nxt[j-1];
        if(ptr[j]==ptr[i]) j++;
        nxt[i]=j;
    }
}
int kmp(const char str[],const char ptr[]){
    int len1=strlen(str);
    int len2=strlen(ptr);
    int j=0;
    get_next(ptr,len2);
    for (int i=0; i<len1; i++){
        while(j && ptr[j]!=str[i]) j=nxt[j-1];
        if(ptr[j]==str[i]) j++;
        if(j==len2){
            ans++;
            j=nxt[j-1];
        }
    }
    return ans;
}
int getint(){
    int x=0,s=1; char ch=' ';
    while(ch<'0' || ch>'9'){ ch=getchar(); if(ch=='-') s=-1;}
    while(ch>='0' && ch<='9'){ x=x*10+ch-'0'; ch=getchar();}
    return x*s;
}
int main(){
    //ios::sync_with_stdio(false);
    //cin.tie(0);
    scanf("%d",&n);
    while(n--){
        memset(ptr,0,sizeof(ptr));
        memset(str,0,sizeof(str));
        memset(nxt,0,sizeof(nxt));
        ans=0;
        scanf("%s",ptr);
        scanf("%s",str);
        printf("%d\n",kmp(str,ptr));
    }
    return 0;
}

TangerineICE

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
week14 字符串哈希与KMP

问题描述ZJM 为了准备霍格沃兹的期末考试，决心背魔咒词典，一举拿下咒语翻译题题库格式：[魔咒] 对应功能背完题库后，ZJM 开始刷题，现共有 N 道题，每道题给出一个字符串，可能是 [魔咒]，也可能是对应功能ZJM 需要识别这个题目给出的是 [魔咒] 还是对应功能，并写出转换的结果，如果在魔咒词典里找不到，输出 “what?”Input首先列出魔咒词典中不超过100000条不同的咒语，每条格式为：[魔咒] 对应功能其中“魔咒”和“对应功能”分别为长度不超过20和80的字符串，字符串中保证不
复制链接

扫一扫