HDU1686 P3375 HDU1711(kmp入门)

大白QQly成长日记

于 2019-01-21 15:49:45 发布

阅读量262

点赞数 2

分类专栏： KMP 文章标签： KMP

本文链接：https://blog.csdn.net/pleasantly1/article/details/86570005

版权

KMP 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一直不理解kmp，近期学习一下。

简单来说kmp算法就是利用匹配串的自身匹配进行预处理，从而使暴力匹配的O(n*m)降到了O(n+m).

但这句话是很绕的，什么是自身匹配，又是怎么预处理。

我们先来考虑一下对暴力匹配的优化以便于理解。(个人理解，可能有误)

假如有一个前提：匹配串中的字符全部唯一。那么在进行暴力匹配时就可以跳过已匹配部分。例如：

总串：abdefghabc 匹配串：abc

那么枚举找子串时，匹配到第三个位置d！=c，这时因为匹配串字符唯一，所以前面ab的长度可以跳过，外层枚举可以直接加ab的长度，从而实现一定程度的优化。

因为这个优化是有前提条件的，所以我们想办法从特殊扩展到一般。我们可以换一个角度来看刚才跳过的部分，以匹配串长度为单位枚举，此时复杂度从最坏n*m变成了稳定n*m,那么我们跳过(优化)的就可以理解成2个串的前缀与后缀相同的部分？例如：总串：aaabbbbabaaca 匹配串：abaa 那么在bbab与abaa进行匹配时，因为bbab的后缀ab与匹配串的前缀ab相同，所以可以直接跳过bb部分枚举与ab部分的匹配直接跳到2个串第三个字符开始。

那么我们整理一下这个匹配方法。因为跳过的是前缀与后缀相同部分，所以我们需要先处理出前后缀，子串的前缀与模块串的后缀，但这是难以实现的，因为枚举出的模块串太多，这样反而增加了复杂度。这时候就利用找相同部分的性质进行自身匹配 我们找的是相等前缀和后缀（我们跳过的就是这部分），那么它们一定可以在自身找到。

例如：匹配串：ababc 它的前缀为 a ab aba abab 把每个前缀当成独立的子串，再找出每个子串的最长公共前后缀。

保存每个位置对应子串的最长公共前后缀长度，则为 0 0 1 2 用数组next来存储。

如何利用next数组，在暴力匹配时，若不匹配位置为匹配串的第i个时跳转到匹配串的第next[i]位置，即跳过了相同前后缀部分

例如：(别人的图)

这里写图片描述

以此类推，实现全部匹配。

预处理就是求出匹配串的next数组，实现了预处理那kmp就基本完成了，因为匹配只是串的目标变了，都是指针移动。

如何预处理

void fun(int z)
{
	p[0]=-1;//单个字符是没有前后缀的
	int k=-1,j=0;//从最小子串的前缀和后缀开始，j为后缀位置，k为前缀位置
	while(j<z)
	{
		if(k==-1||s[k]==s[j]) k++,j++,p[j]=k; 
		else k=p[k];
	}
}

例题一：

HDU 1686

Problem Description

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

BAPC

AZA

AZAZAZA

VERDI

AVERDXIVYERDIAN

Sample Output

题目大意：给2个串s1,s2，输出s2中s1的个数。

思路：kmp入门题，直接套即可。

代码如下：

#include<iostream>
#include<algorithm>
#include<stdio.h>
#include<string.h>
#include<string>
#include<math.h>
#include<queue>
#include<stack>
#define per(i,a,b) for(int i=a;i<=b;++i)
#define rep(i,a,b) for(inr i=a;i>=b;--i)
#define inf 0xf3f3f3f
using namespace std;
int p[10005],n;
string s,s1;
void fun(int z)
{
	p[0]=-1;
	int k=-1,j=0;
	while(j<z)
	{
		if(k==-1||s[k]==s[j]) k++,j++,p[j]=k; 
		else k=p[k];
	}
}
int kmp(int z)
{
	int k=0,j=0,num=0;
	while(j<z)
	{
		if(k==-1||s[k]==s1[j]) ++k,++j;
		else k=p[k];
		if(k==s.size())
		{
			num++;
			k=p[k];
		}
	}
	return num;
}
int main()
{
	scanf("%d",&n);
	while(n--)
	{
		memset(p,0,sizeof(p));
		cin>>s>>s1;
		fun(s.size());
		cout<<kmp(s1.size())<<endl;
	}
	return 0;
}

例题二：

洛谷3375

题目描述

如题，给出两个字符串s1和s2，其中s2为s1的子串，求出s2在s1中所有出现的位置。

为了减少骗分的情况，接下来还要输出子串的前缀数组next。

（如果你不知道这是什么意思也不要问，去百度搜[kmp算法]学习一下就知道了。）

输入输出格式

输入格式：

第一行为一个字符串，即为s1

第二行为一个字符串，即为s2

输出格式：

若干行，每行包含一个整数，表示s2在s1中出现的位置

接下来1行，包括length(s2)个整数，表示前缀数组next[i]的值。

输入输出样例

输入样例#1： 复制

ABABABC
ABA

输出样例#1： 复制

1
3
0 0 1

说明

时空限制：1000ms,128M

数据规模：

设s1长度为N，s2长度为M

对于30%的数据：N<=15，M<=5

对于70%的数据：N<=10000，M<=100

对于100%的数据：N<=1000000，M<=1000000

样例说明：

所以两个匹配位置为1和3，输出1、3

思路：还是入门题。

代码如下：

#include<iostream>
#include<algorithm>
#include<stdio.h>
#include<string.h>
#include<string>
#include<math.h>
#include<queue>
#include<stack>
#define per(i,a,b) for(int i=a;i<=b;++i)
#define rep(i,a,b) for(inr i=a;i>=b;--i)
#define inf 0xf3f3f3f
using namespace std;
int p[1000005],n;
string s,s1;
void fun(int z)
{
	p[0]=-1;
	int k=-1,j=0;
	while(j<z)
	{
		if(k==-1||s[k]==s[j]) k++,j++,p[j]=k; 
		else k=p[k];
	}
}
void kmp(int z)
{
	int k=0,j=0;
	while(j<z)
	{
		if(k==-1||s[k]==s1[j]) ++k,++j;
		else k=p[k];
		if(k==s.size())
		{
			k=p[k];
			cout<<j-s.size()+1<<endl;
		}
	}
	per(i,1,s.size()) cout<<p[i]<<" ";
}
int main()
{
	memset(p,0,sizeof(p));
	cin>>s1>>s;
	fun(s.size());
	kmp(s1.size());
	return 0;
}

例题三：

HDU1711

Problem Description

Given two sequences of numbers : a[1], a[2], ...... , a[N], and b[1], b[2], ...... , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], ...... , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.

Input

The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], ...... , a[N]. The third line contains M integers which indicate b[1], b[2], ...... , b[M]. All integers are in the range of [-1000000, 1000000].

Output

For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.

Sample Input

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 1 3

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 2 1

Sample Output

-1

题目大意：给出2个数组，求二个数组在第一个数组中出现最早的下标，未出现输出-1.

代码如下：

#include<iostream>
#include<algorithm>
#include<stdio.h>
#include<string.h>
#include<string>
#include<math.h>
#include<queue>
#include<stack>
#define per(i,a,b) for(int i=a;i<=b;++i)
#define rep(i,a,b) for(inr i=a;i>=b;--i)
#define inf 0xf3f3f3f
#define ll long long int
using namespace std;
ll n,a,b,s[1000005],s1[10005],p[10005];
int main()
{
	std::ios::sync_with_stdio(false);
	cin>>n;
	while(n--)
	{
		memset(p,0,sizeof(p));
		cin>>a>>b;
		per(i,1,a) cin>>s[i];
		per(i,1,b) cin>>s1[i];
		p[1]=0;ll k=0,j=1,fag=0;
		while(j<=b)
		{
			if(k==0||s1[k]==s1[j]) ++j,++k,p[j]=k;
			else k=p[k];
		}
		k=1;j=1;
		while(j<=a)
		{
			if(k==0||s1[k]==s[j]) ++j,++k;
			else k=p[k];
			if(k==b+1)
			{
				fag=1;
				cout<<j-b<<endl;
				break;
			}
		}
		if(fag==0) printf("-1\n");
	}
	return 0;
}

大白QQly成长日记

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HDU1686 P3375 HDU1711(kmp入门)

一直不理解kmp，近期学习一下。简单来说kmp算法就是利用匹配串的自身匹配进行预处理，从而使暴力匹配的O(n*m)降到了O(n+m).但这句话是很绕的，什么是自身匹配，又是怎么预处理。我们先来考虑一下对暴力匹配的优化以便于理解。(个人理解，可能有误)假如有一个前提：匹配串中的字符全部唯一。那么在进行暴力匹配时就可以跳过已匹配部分。例如：总串：abdefghabc 匹配串：a...
复制链接

扫一扫