串的模式匹配算法KMP

自从2015年上半年学会KMP，到现在2019年12月20日，已经快5年过去了，陆陆续续地四次认真学习KMP，第一次是数据结构课程上学习，第二次是考试之前复习，第三次是认真搞ACM的时候又学了一遍，第四次是今天，我即将结束长达一年的销售工作，重新做回程序员，最近在把ACM的几乎所有所学重新学习了一遍，大部分都看一下之前的代码就OK了，唯独KMP比较复杂。
这么多年以来，今天写这篇博客，我才第一次真正的觉得自己吃透了KMP，很多东西一定要用严格的语言表述出来，才真的懂了。
可能写的不如其他网友清楚，但是关键在于自己写的这个过程，认真写完之后思路无比清晰，这才是我写了一千多篇博客之后的重要无比的收获。

一，原作引用

数据结构(严蔚敏C语言版)

二，串的模式匹配算法

有一个文本串S，长度为n，和一个模式串T，长度为m，现在要查找T在S中的位置，怎么查找？

方法一：暴力枚举，时间复杂度O（n*m）

方法二：KMP，时间复杂度O（n+m）

KMP是由3位大佬同时发现，三个人的名字首字母分别是K、M、P

三，KMP的核心思想

暴力枚举当匹配失败即S[i]!=T[j]时，j需要回溯到1，从第一个字符重新匹配，而i也需要回溯到i-j+2

KMP的核心思想是，先充分挖掘模式串T的信息，当匹配失败时，i不动，j回溯到某个位置即可，因为匹配失败位置的前j-1个字符都是匹配成功的，所以S的这一段和T的这一段是相同的，也就是说，所有需要的信息都在T中。

那么，这个需要充分挖掘的信息，就是当匹配失败时，如何回溯，这个信息其实就是T的next数组。

四，next数组

注意：

next数组的定义是：对于j，next[j] 定义为满足式（4-4）的最大k值，写成表达式即为式（4-5）

具体求next的方法：

五，KMP方法

得到next数组之后，还是双指针扫描，当匹配失败即S[i]!=T[j]时，j需要回溯到next[j]继续匹配，直到模式串匹配完，即得到一个完全匹配。

暴力算法在一般数据的情况下，时间也接近（n+m），而KMP除了时间复杂度是O（n+m）之外，还有一个好处就是在很多场景下，不需要存主串S，只需要存模式串T，然后输入S的时候边输入边计算，不需要存，这种情况下空间复杂度就是O（m）

六，改进的next数组

当匹配失败时，我们需要把j回溯到next[j]重新与i匹配，使得尽快找到一个完全匹配。

然而，式（4-5）只是next数组的一个必要条件，并不是充分条件。

如果T[j]==T[next[j]]，那么把j回溯到next[j]之后，重新与i匹配也一定会失败，所以这个地方需要继续往前回溯。

如此得到nextval数组：

改进后的nextval数组，可以理解为充分必要。即j只能回溯到nextval[j]，无法回溯到更远的地方，因为nextval[j]是有可能和S[i]匹配成功的，这个有可能只取决于s[i]是什么，和T无关，所以nextval数组无法再优化。

七，难点总结

1，用KMP算法匹配主串S和模式串T，需要预先求出T的next数组，而求next数组的过程，其实是T从任意处和T本身的一个匹配的过程，而这个过程用的还是KMP。

是否逻辑循环？没有，这个KMP只需要基于已经求出来的前面一部分next数组即可。

换句话说，利用next数组的前i项求第i+1项的过程，就是一个kmp的过程，也就是如何利用next数组快速匹配的过程

2，由于某种特殊性，经常会有一个神奇的现象，逻辑错误的代码运行结果也是正确的，这会在我们的学习过程中造成障碍。

八，next数组和nextval数组的相同点和不同点

1，相同点：
（1）get_next和get_nextval都是把P串和P串进行匹配，在实际匹配过程中一个个算出
（2）求next的过程和求nextval的过程都是递推
（3）用next和nextval都可以做KMP，时间复杂度都是O（n+m）

2，不同点：
（1）next的递推过程分为2步，第一步，若next[i]==j，则循环执行j=next[......next[next[j]]......]直到P[j]=P[next[j]]，第二步，直接next[i+1]=j+1，完成递推。

而nextval的递推过程也是2步，第一步同上，即若nextval[i]==j，则循环执行j=nextval[......nextval[nextval[j]]......]直到P[j]=P[nextval[j]]，第二步，循环执行nextval[i+1]=nextval[......nextval[nextval[j+1]]......]，直到得到的nextval[i+1]满足P[i+1]!=P[nextval[i+1]]

PS：因为j+1<i+1，所以nextval[j+1]已经被计算过，所以一定有P[i+1]!=P[nextval[j+1]]，所以第二步的循环，实际执行次数为0或1，所以nextval[i+1]=j+1或nextval[j+1]即可，不需要再写循环，这实际上是包含了一个动态规划的过程在里面。

（2）nextval并不满足对next数组的定义（4-5），准确来说，next[j] 定义为满足式（4-4）的最大k值，而nextval[j] 定义为既满足式（4-4）又满足T[j]!=T[k]的最大k值

九，KMP和next数组的应用

简单来说，有2大类问题，一类是字符串匹配问题，用kmp，一类是字符串周期问题，用next或者nextval

template <typename T = char>
class KMP {
public:
	// 输出长为m+1的next数组，next[0]废弃
	static vector<int> getNext(const T *p, int m)
	{
		int i = 1, j = 0;
		vector<int>next(m + 1);
		next[0] = next[1] = 0;
		while (i < m)
		{
			if (j == 0 || p[i - 1] == p[j - 1])next[++i] = ++j;
			else j = next[j];
		}
		return next;
	}
	// 输出长为m+1的改进next数组，next[0]废弃
	static vector<int> getNextVal(const T *p, int m)
	{
		int i = 1, j = 0;
		vector<int>next(m + 1);
		next[0] = next[1] = 0;
		while (i < m)
		{
			if (j == 0 || p[i - 1] == p[j - 1])
			{
				if (p[i++] != p[j++])next[i] = j;
				else next[i] = next[j];
			}
			else j = next[j];
		}
		return next;
	}
	// 求末尾至少添加多少字符才能变成周期串（至少出现2段周期）
	static int addToCircle(const T *p, int m)
	{
		vector<int> next = getNext(p, m);
		int k = next[m];
		while (k && p[m - 1] != p[k - 1])k = next[k];
		int dif = m - k;
		return (dif - m % dif) % dif + (!k)*m;
	}
	// 求最小周期
	static int minCircle(const T *p, const vector<int> &next, int m)
	{
		int dif = m - next[m];
		if (m%dif == 0 && p[m - 1] == p[next[m] - 1])return dif;
		return m;
	}
	// 求最小周期  
	static int minCircle(const T *p, int m)
	{
		return minCircle(p, getNext(p, m), m);
	}
	// 输入主串和模式串，求最大匹配数（主串匹配出来的串不能互相覆盖）
	static int maxMatchNum(const T *pn, int n, const T* pm, int m)
	{
		int sum = 0, x = -m;
		vector<int> v = allMatchId(pn, n, pm, m);
		for (auto vi : v) {
			if (vi >= x + m)sum++, x = vi;
		}
		return sum;
	}
	// 输入主串和模式串，求最大匹配数（主串匹配出来的串可以互相覆盖）
	static int maxMatchNum2(const T *pn, int n, const T* pm, int m)
	{
		return allMatchId(pn, n, pm, m).size();
	}
	// 输入主串和模式串，求最大连续匹配数
	static int maxMatchNum3(const T *pn, int n, const T* pm, int m)
	{
		return longestSubsequence(allMatchId(pn, n, pm, m),m);
	}
	// 求第一个匹配的坐标(范围是1到n-m+1)，没有就返回-1
	static int firstMatchId(const T *pn, int n, const T* pm, int m)
	{
		vector<int> v = allMatchId(pn, n, pm, m);
		return v.size() ? v[0] : -1;
	}
private:
	// 输入主串和模式串，求所有匹配id(范围是1到n-m+1)
	static vector<int> allMatchId(const T *pn, int n, const T* pm, int m)
	{
		vector<int>ans;
		vector<int>next = getNext(pm, m);
		int i = 0, j = 0;
		while (i < n)
		{
			if (pn[i] == pm[j])
			{
				i++;
				if (++j >= m)ans.push_back(i - m + 1), j = next[j];
			}
			else
			{
				if (next[j + 1])j = next[j + 1] - 1;
				else i++, j = 0;
			}
		}
		return ans;
	}
	int longestSubsequence(const vector<int>& arr, int difference) {
		map<int, int>ans;
		int res = 0;
		for (int i = 0; i < arr.size(); i++)
		{
			ans[arr[i]] = max(ans[arr[i]], ans[arr[i] - difference] + 1);
			res = max(res, ans[arr[i]]);
		}
		return res;
	}
};


struct Node {
	int p[100005];
	int num[100005][26]; //num[i][j]是p前i个数中不超过j的数的个数
};
Node nodes, nodet;

class OrderMatch {
public:
	OrderMatch(Node &nodes, Node &nodet)
	{
		this->nodes = nodes, this->nodet = nodet;
	}
	vector<int> match(int n, int m)
	{
		vector<int>next = getNext(nodet, m);
		int i = 0, j = 0;
		vector<int>ans;
		while (i <= n)
		{
			if (j == 0)i++, j++;
			else if (isSame(nodes, i, nodet, j))
			{
				i++, j++;
				if (j > m)--i, --j, ans.push_back(i - j + 1), j = next[j];
			}
			else j = next[j];
		}
		return ans;
	}
private:
	bool isSame(Node &nodes, int x, Node &nodet, int y)
	{
		int kx = nodes.p[x], ky = nodet.p[y];
		return nodet.num[y][ky] == nodes.num[x][kx] - nodes.num[x - y][kx] &&
			nodet.num[y][ky - 1] == nodes.num[x][kx - 1] - nodes.num[x - y][kx - 1];
	}
	vector<int> getNext(Node &nodet, int m)		//m是串的长度
	{
		int i = 1, j = 0;
		vector<int>next(m + 1);
		next[0] = next[1] = 0;
		while (i < m)
		{
			if (j == 0 || isSame(nodet, i, nodet, j))next[++i] = ++j;
			else j = next[j];
		}
		return next;
	}
private:
	Node nodes, nodet;
};

1，字符串周期问题

（1）给一个字符串，求至少添加多少字符才能变成周期串

用next或者nextval都可以，我的模板库里用next实现了。

HDU 3746 Cyclic Nacklac

题目：

Description

CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of "HDU CakeMan", he wants to sell some little things to make money. Of course, this is not an easy task.

As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl's fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls' lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet's cycle is 9 and its cyclic count is 2:

Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.

Input

The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by 'a' ~'z' characters. The length of the string Len: ( 3 <= Len <= 100000 ).

Output

For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.

Sample Input

3
aaa
abca
abcde

Sample Output

0
2
5

题意：给一个字符串，求至少添加多少字符才能变成周期串。


#include<iostream>
#include<string.h>
#include<vector>
#include<map>
using namespace std;

char c[100005];

......

int main()
{
	int t;
	cin >> t;
	gets(c);
	while (t--)
	{
		gets(c);
		int len = strlen(c);
		cout << KMP<char>::addToCircle(c, len) << endl;
	}
	return 0;
}

CSU 1353 Guessing the Number

题目：

Description

    Alice thought out a positive integer x, and x does not have leading zeros. She writes x several times in a line one by one and gets a digits sequence.
    For example, suppose the number Alice thought out is 231 and she writes it 4 times, then she will get the digits sequence 231231231231.
    Now Alice just shows you a continuous part of that digits sequence, and could you find out the minimum probable value of x?
    For example, if Alice showed you 31231, x may be 312, because you will get 312312 if you writes it 2 times, and 31231 is a continuous part of 312312. Obviously x may also be 231, 1231, 31231, 231731, etc. But the minimum probable value of x is 123, write it 3 times and you will see 31231 is a continuous part of 123123123.

Input

    The first line has one integer T, means there are T test cases.
    For each test case, there is only one digits sequence in a line. The number of digits is in range [1, 105].
    The size of the input file will not exceed 5MB.

Output

For each test case, print the minimum probable value of x in one line.

Sample Input

Sample Output

思路：

首先根据kmp求出字符串的周期，参考HDU 3746 Cyclic Nacklac

然后字符串最小表示法有个专门的方法，思想和kmp很像

代码：

#include<iostream>
#include<string.h>
using namespace std;

char c[200005];
int next_[100005];
int l;

void get_next(char *t, int m, int next[])
{
	int i = 1, j = 0;
	next[0] = next[1] = 0;
	while (i < m)
	{
		if (j == 0 || t[i] == t[j])next[++i] = ++j;
		else j = next[j];
	}
}

int getmin(int n)
{
	int i = 1, j = 2, k = 0;
	while (i <= n && j <= n && k <= n)
	{
		if (c[i] == '0')i++;
		else if (c[j] == '0')j++;
		else if (i == j)j++;
		else if (c[i + k] == c[j + k])k++;
		else if (c[i + k] > c[j + k])i += k + (!k), k = 0;
		else j += k + (!k), k = 0;
	}
	return (i <= n) ? i : j;
}

int main()
{
	int t;
	scanf("%d", &t);
	c[0] = '0';
	while (t--)
	{
		scanf("%s", &c[1]);
		l = strlen(c + 1);
		bool flag = true;//全0
		for (int i = 1; i <= l; i++)if (c[i] != '0')flag = false;
		if (flag)
		{
			printf("1%s\n", c + 1);
			continue;
		}
		get_next(c, l, next_);
		int k = next_[l];
		while (k && c[l] != c[k])k = next_[k];
		l -= k;
		for (int i = l + 1; i <= l * 2; i++)c[i] = c[i - l];
		int key = getmin(l);
		c[key + l] = '\0';
		printf("%s\n", c + key);
	}
	return 0;
}

（2）给一个字符串，求它的最小周期

用next求是可以的，能否用nextval求不清楚，好像是不行的。

OpenJ_Bailian - 2406、POJ 2406 Power Strings

题目：

Description

Given two strings a and b we define a*b to be their concatenation. For example, if a = "abc" and b = "def" then a*b = "abcdef". If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = "" (the empty string) and a^(n+1) = a*(a^n).

Input

Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.

Output

For each s you should print the largest n such that s = a^n for some string a.

Sample Input

abcd
aaaa
ababab
.

Sample Output

1
4
3

题意：给一个字符串，求它的最小周期，输出共重复出现多少次，即总长度除以最小周期的长度。

字符串周期问题用简单的next数组即可，不用改进的next数组

代码：

#include<iostream>
#include<string.h>
#include<vector>
#include<map>
using namespace std;

......

char c[1000001];

int main()
{
	while (cin >> c)
	{
		if (c[0] == '.')break;
		int len = strlen(c);
		cout << len / KMP<char>::minCircle(c, len) << endl;
	}
	return 0;
}

HDU - 1358 、POJ 1961 Period

题目：·

Description

For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A K , that is A concatenated K times, for some string A. Of course, we also want to know the period K.

Input

The input file consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S. The second line contains the string S. The input file ends with a line, having the number zero on it.

Output

For each test case, output “Test case #” and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case.

Sample Input

3
aaa
12
aabaabaabaab
0

Sample Output

Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4

代码：

#include<iostream>
#include<string>
#include<string.h>
#include<vector>
#include<map>
using namespace std;

template <typename T = char>
class KMP {
public:
。。。。。。
};


int main()
{
	ios_base::sync_with_stdio(false);
	int n;
	int casei = 0;
	int dif;
	while (1)
	{
		cin >> n;
		casei++;
		if (n == 0)break;
		string c;
		cin >> c;
		cout << "Test case #" << casei << endl;
		auto next = KMP<char>::getNext(c.data(), c.length());
		for (int i = 2; i <= n; i++)
		{
			int ans = i / KMP<char>::minCircle(c.data(), next, i);
			if (ans > 1)cout << i << " " << ans << endl;
		}
		cout << endl;
	}
	return 0;
}

力扣 459. 重复的子字符串

给定一个非空的字符串 s ，检查是否可以通过由它的一个子串重复多次构成。

示例 1:

输入: s = "abab"
输出: true
解释: 可由子串 "ab" 重复两次构成。

示例 2:

输入: s = "aba"
输出: false

示例 3:

输入: s = "abcabcabcabc"
输出: true
解释: 可由子串 "abc" 重复四次构成。 (或子串 "abcabc" 重复两次构成。)

提示：

1 <= s.length <= 104
s 由小写英文字母组成

class Solution {
public:
	bool repeatedSubstringPattern(string s) {
		int c = KMP<char>::minCircle(s.data(), s.length());
		return c < s.length();
	}
};

2，字符串匹配问题

（1）求最大匹配数（主串匹配出来的串不能互相覆盖）

HDU 2087 减花布条

题目：
Description
一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？

Input
输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。

Output
输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。

Sample Input
abcde a3
aaaaaa aa
#
Sample Output
0
3

代码：

#include<iostream>
#include<string.h>
#include<vector>
#include<map>
using namespace std;

。。。。。。

char listn[1005];		//长串
char listm[1005];		//短串

int main()
{
	ios_base::sync_with_stdio(false);	//减小输入输出的时间
	int n, m;
	while (1)
	{
		cin >> listn;
		n = strlen(listn);
		if (n == 1 && listn[0] == '#')break;
		cin >> listm;
		m = strlen(listm);
		cout << KMP<char>::maxMatchNum(listn,n,listm,m) << endl;
	}
	return 0;
}

CSU 1598 最长公共前缀

题目：

Description

给定两个字符串s和t，现有一个扫描器，从s的最左边开始向右扫描，每次扫描到一个t就把这一段删除，输出能发现t的个数。

Input

第一行包含一个整数T（T<=50），表示数据组数。
每组数据第一行包含一个字符串s，第二行一个字符串t，字符串长度不超过1000000。

Output

对于每组数据，输出答案。

Sample Input

2
ababab
ab
ababab
ba

Sample Output

3
2

代码：

#include<iostream>
#include<stdio.h>
#include<string.h>
using namespace std;

char s[1000005], t[1000005];
int nextt[1000005];

void getnext(char *p, int m, int *next)	
{
	int i = 1, j = 0;
	next[0] = next[1] = 0;
	while (i < m)
	{
		if (j == 0 || p[i] == p[j])
		{
			i++;
			j++;
			if (p[i] != p[j])next[i] = j;
			else next[i] = next[j];
		}
		else j = next[j];
	}
}

int main()
{
	int n;
	cin >> n;
	while (n--)
	{
		scanf("%s%s", s + 1, t + 1);
		int ls = strlen(s + 1), lt = strlen(t + 1);
		getnext(t, lt, nextt);
		int sum = 0;
		int i = 1, j = 1;
		while (i <= ls)
		{
			if (j == 0)i++, j++;
			else if (s[i] == t[j])
			{
				i++, j++;
				if (j > lt)sum++, i--, j = 0;
			}
			else j = nextt[j];
		}
		printf("%d\n", sum);
	}
	return 0;
}

力扣面试题 01.09. 字符串轮转

字符串轮转。给定两个字符串s1和s2，请编写代码检查s2是否为s1旋转而成（比如，waterbottle是erbottlewat旋转后的字符串）。

示例1:

 输入：s1 = "waterbottle", s2 = "erbottlewat"
 输出：True

示例2:

 输入：s1 = "aa", s2 = "aba"
 输出：False

提示：

字符串长度在[0, 100000]范围内。

说明:

你能只调用一次检查子串的方法吗？

class Solution {
public:
    bool isFlipedString(string s1, string s2) {
        if(s1==s2)return true;
        if(s1.empty()||s2.empty())return false;
        s1+=s1;
        return KMP<>::maxMatchNum(s1.data(),s1.length(),s2.data(),s2.length());
    }
};

（2）求最大匹配数（主串匹配出来的串可以互相覆盖）

和不能覆盖的求匹配数问题差不多，但是每完成一个匹配之后，j不是回溯到0，而是next[j]，而且只能用next数组，不能用nextval数组。

CodeForces 471D MUH and Cube Walls

题目：

Description

Polar bears Menshykov and Uslada from the zoo of St. Petersburg and elephant Horace from the zoo of Kiev got hold of lots of wooden cubes somewhere. They started making cube towers by placing the cubes one on top of the other. They defined multiple towers standing in a line as a wall. A wall can consist of towers of different heights.

Horace was the first to finish making his wall. He called his wall an elephant. The wall consists of w towers. The bears also finished making their wall but they didn't give it a name. Their wall consists of n towers. Horace looked at the bears' tower and wondered: in how many parts of the wall can he "see an elephant"? He can "see an elephant" on a segment of w contiguous towers if the heights of the towers on the segment match as a sequence the heights of the towers in Horace's wall. In order to see as many elephants as possible, Horace can raise and lower his wall. He even can lower the wall below the ground level (see the pictures to the samples for clarification).

Your task is to count the number of segments where Horace can "see an elephant".

Input

The first line contains two integers n and w (1 ≤ n, w ≤ 2·105) — the number of towers in the bears' and the elephant's walls correspondingly. The second line contains n integers ai (1 ≤ ai ≤ 109) — the heights of the towers in the bears' wall. The third line contains w integers bi (1 ≤ bi ≤ 109) — the heights of the towers in the elephant's wall.

Output

Print the number of segments in the bears' wall where Horace can "see an elephant".

Sample Input

Input

13 5
2 4 5 5 4 3 2 2 2 3 3 2 1
3 4 4 3 2

Output

Hint

The picture to the left shows Horace's wall from the sample, the picture to the right shows the bears' wall. The segments where Horace can "see an elephant" are in gray.

首先把图形转换成数字化的定义：

输入2个数组a和b，在a里面找连续的一段，使得和b一样长，而且所有对应的2个数的差都是一样的。

输出也多少个满足条件的段（即位置）

比如输入

13 5
2 4 5 5 4 3 2 2 2 3 3 2 1
3 4 4 3 2

取a的4 5 5 4 3这一段，那么它和b的差都是1，取2 3 3 2 1这一段，那么差都是-1

然后，理解了题意，就要开始思考了。

规律还是很明显的，上面的表述可以进一步转化：

2个等长的数组，满足a[i]-a[i-1]==b[i]-b[i-1]恒成立。

于是，在最开始输入的时候，只需要记录差分即可。

第1个数的值不用记录，所以我们实际上得到了n-1长的数组a和w-1长的数组b，

要求的就是问a有多少个位置可以匹配b。

这不就是KMP了吗。

不过，稍微注意一下，当w=1或者n=1的时候需要特殊处理。

代码：

#include<iostream>
#include<string.h>
#include<vector>
#include<map>
using namespace std;
 
int a[200001];
int b[200001];
 
。。。。。。
 
int main()
{
	int n, w, x, y;
	scanf("%d%d%d", &n, &w, &x);
	for (int i = 1; i < n; i++)
	{
		scanf("%d", &y);
		a[i] = y - x;
		x = y;
	}
	scanf("%d", &x);
	for (int i = 1; i < w; i++)
	{
		scanf("%d", &y);
		b[i] = y - x;
		x = y;
	}
	int sum = 0;
	if (n == 1)sum += (w == 1);
	else if (w == 1)sum += n;
	else if (n >= w) sum = KMP<int>::maxMatchNum2(a + 1, n - 1, b + 1, w - 1);
	cout << sum << endl;
	return 0;
}

（3）求最大连续匹配

力扣 1668. 最大重复子字符串

给你一个字符串 sequence ，如果字符串 word 连续重复 k 次形成的字符串是 sequence 的一个子字符串，那么单词 word 的 重复值为 k 。单词 word 的最大重复值 是单词 word 在 sequence 中最大的重复值。如果 word 不是 sequence 的子串，那么重复值 k 为 0 。

给你一个字符串 sequence 和 word ，请你返回 最大重复值 k 。

示例 1：

输入：sequence = "ababc", word = "ab"
输出：2
解释："abab" 是 "ababc" 的子字符串。

示例 2：

输入：sequence = "ababc", word = "ba"
输出：1
解释："ba" 是 "ababc" 的子字符串，但 "baba" 不是 "ababc" 的子字符串。

示例 3：

输入：sequence = "ababc", word = "ac"
输出：0
解释："ac" 不是 "ababc" 的子字符串。

提示：

1 <= sequence.length <= 100
1 <= word.length <= 100
sequence 和 word 都只包含小写英文字母。

思路一：

普通方法，本题数据量比较小

class Solution {
public:
	int maxRepeating(string sequence, string word) {
		vector<int>v;
		for (int i = 0; i + word.size() <= sequence.size(); i++) {
			if (sequence.substr(i, word.size()) == word)v.push_back(1);
			else v.push_back(-1234567);
		}
		auto v2 = LoopSplit(v, word.size());
		int ans = 0;
		for (auto v : v2) {
			if(v.size())ans = max(ans, MaxSubArray(v));
		}
		return ans;
	}
};

思路二：

KMP

class Solution {
public:
	int maxRepeating(string s, string t) {
		return KMP<char>::maxMatchNum3(s.data(),s.length(), t.data(), t.length());
	}
};

（4）求第一个匹配成功的位置

HDU 1711 Number Sequenc

题目：

Given two sequences of numbers : a[1], a[2], ...... , a[N], and b[1], b[2], ...... , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], ...... , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.

Input

The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], ...... , a[N]. The third line contains M integers which indicate b[1], b[2], ...... , b[M]. All integers are in the range of [-1000000, 1000000].

Output

For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.

Sample Input

2
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 1 3
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 2 1

Sample Output

6
-1

代码：

#include<iostream>
#include<string.h>
#include<vector>
#include<map>
using namespace std;

......

int listn[1000001];		//长串
int listm[10001];		//短串

int main()
{
	ios_base::sync_with_stdio(false);	//减小输入输出的时间
	int t, n, m;
	cin >> t;
	while (t--)
	{
		cin >> n >> m;
		for (int i = 0; i < n; i++)cin >> listn[i];
		for (int i = 0; i < m; i++)cin >> listm[i];
		int ans = KMP<int>::firstMatchId(listn, n, listm, m);
		cout << ans + (ans > -1) << endl;
	}
	return 0;
}

力扣 28. 找出字符串中第一个匹配项的下标

给你两个字符串 haystack 和 needle ，请你在 haystack 字符串中找出 needle 字符串的第一个匹配项的下标（下标从 0 开始）。如果 needle 不是 haystack 的一部分，则返回 -1 。

示例 1：

输入：haystack = "sadbutsad", needle = "sad"
输出：0
解释："sad" 在下标 0 和 6 处匹配。
第一个匹配项的下标是 0 ，所以返回 0 。

示例 2：

输入：haystack = "leetcode", needle = "leeto"
输出：-1
解释："leeto" 没有在 "leetcode" 中出现，所以返回 -1 。

提示：

1 <= haystack.length, needle.length <= 104
haystack 和 needle 仅由小写英文字符组成

class Solution {
public:
	int strStr(string haystack, string needle) {
		return KMP<char>::firstMatchId(haystack.data(), haystack.length(), needle.data(), needle.length());
	}
};

力扣 686. 重复叠加字符串匹配

给定两个字符串 a 和 b，寻找重复叠加字符串 a 的最小次数，使得字符串 b 成为叠加后的字符串 a 的子串，如果不存在则返回 -1。

注意：字符串 "abc" 重复叠加 0 次是 ""，重复叠加 1 次是 "abc"，重复叠加 2 次是 "abcabc"。

示例 1：

输入：a = "abcd", b = "cdabcdab"
输出：3
解释：a 重复叠加三遍后为 "abcdabcdabcd", 此时 b 是其子串。

示例 2：

输入：a = "a", b = "aa"
输出：2

示例 3：

输入：a = "a", b = "a"
输出：1

示例 4：

输入：a = "abc", b = "wxyz"
输出：-1

提示：

1 <= a.length <= 104
1 <= b.length <= 104
a 和 b 由小写英文字母组成

class Solution {
public:
	int repeatedStringMatch(string a, string b) {
		int lena = a.length(), lenb = b.length();
		int num = (lena + lenb - 1) / lena;
		string s = "";
		for (int i = 0; i < num; i++)s += a;
		if (KMP<char>::firstMatchId(s.data(), s.length(), b.data(), lenb) > -1)return num;
		s += a, num++;
		if (KMP<char>::firstMatchId(s.data(), s.length(), b.data(), lenb) > -1)return num;
		return -1;
	}
};

力扣 3037. 在无限流中寻找模式 II

给定一个二进制数组 pattern 和一个类 InfiniteStream 的对象 stream 表示一个下标从 0 开始的二进制位无限流。

类 InfiniteStream 包含下列函数：

int next()：从流中读取一个二进制位（是 0 或 1）并返回。

返回 第一个使得模式匹配流中读取的二进制位的 开始下标。例如，如果模式是 [1, 0]，第一个匹配是流中的高亮部分 [0, 1, 0, 1, ...]。

示例 1：

输入: stream = [1,1,1,0,1,1,1,...], pattern = [0,1]
输出: 3
解释: 模式 [0,1] 的第一次出现在流中高亮 [1,1,1,0,1,...]，从下标 3 开始。

示例 2：

输入: stream = [0,0,0,0,...], pattern = [0]
输出: 0
解释: 模式 [0] 的第一次出现在流中高亮 [0,...]，从下标 0 开始。

示例 3：

输入: stream = [1,0,1,1,0,1,1,0,1,...], pattern = [1,1,0,1]
输出: 2
解释: 模式 [1,1,0,1] 的第一次出现在流中高亮 [1,0,1,1,0,1,...]，从下标 2 开始。

提示：

1 <= pattern.length <= 104
pattern 只包含 0 或 1。
stream 只包含 0 或 1。
生成的输入使模式的开始下标在流的前 105 个二进制位中。


class Solution {
public:
	int findPattern(InfiniteStream* stream, vector<int>& pattern) {
		string s, p;
		for (auto x : pattern) {
			if (x)p += '1';
			else p += '0';
		}
		vector<int>lens{ 1000, 19000, 30000, 60000 };
		for (auto len : lens) {
			while(len--) {
				if (stream->next())s += '1';
				else s += '0';
			}
			auto ans = KMP<>::firstMatchId(s.data(), s.length(), p.data(), p.length());
			if (ans > 0)return ans-1;
		}
		return 0;
	}
};

力扣面试题 17.17. 多次搜索

给定一个较长字符串big和一个包含较短字符串的数组smalls，设计一个方法，根据smalls中的每一个较短字符串，对big进行搜索。输出smalls中的字符串在big里出现的所有位置positions，其中positions[i]为smalls[i]出现的所有位置。

示例：

输入：
big = "mississippi"
smalls = ["is","ppi","hi","sis","i","ssippi"]
输出： [[1,4],[8],[],[3],[1,4,7,10],[5]]

提示：

0 <= len(big) <= 1000
0 <= len(smalls[i]) <= 1000
smalls的总字符数不会超过 100000。
你可以认为smalls中没有重复字符串。
所有出现的字符均为英文小写字母。

class Solution {
public:
	vector<vector<int>> multiSearch(string big, vector<string>& smalls) {
		vector<vector<int>>ans;
		if (big.empty())return vector<vector<int>>(smalls.size());
		for (auto& t : smalls) {
			if (t.empty())ans.push_back(vector<int>{});
			else ans.push_back(KMP<char>::allMatchId(big.data(), big.length(), t.data(), t.length()));
		}
		return ans;
	}
};

3，偏序匹配问题

给定2个串，求主串中有哪些连续子串可以和模式串匹配上，使得偏序结果相同。

SPOJ - CPATTERN Cow Patterns、POJ 3167 Cow Patterns

A particular subgroup of K (1 <= K <= 25,000) of Farmer John's cows likes to make trouble. When placed in a line, these troublemakers stand together in a particular order. In order to locate these troublemakers, FJ has lined up his N (1 <= N <= 100,000) cows. The cows will file past FJ into the barn, staying in order. FJ needs your help to locate suspicious blocks of K cows within this line that might potentially be the troublemaking cows.

FJ distinguishes his cows by the number of spots 1..S on each cow's coat (1 <= S <= 25). While not a perfect method, it serves his purposes. FJ does not remember the exact number of spots on each cow in the subgroup of troublemakers. He can, however, remember which cows in the group have the same number of spots, and which of any pair of cows has more spots (if the spot counts differ). He describes such a pattern with a sequence of K ranks in the range 1..S. For example, consider this sequence:

1 4 4 3 2 1

In this example, FJ is seeking a consecutive sequence of 6 cows from among his N cows in a line. Cows #1 and #6 in this sequence have the same number of spots (although this number is not necessarily 1) and they have the smallest number of spots of cows #1..#6 (since they are labeled as '1'). Cow #5 has the second-smallest number of spots, different from all the other cows #1..#6. Cows #2 and #3 have the same number of spots, and this number is the largest of all cows #1..#6.

If the true count of spots for some sequence of cows is:

5 6 2 10 10 7 3 2 9

then only the subsequence 2 10 10 7 3 2 matches FJ's pattern above.

Please help FJ locate all the length-K subsequences in his line of cows that match his specified pattern.

Input

Line 1: Three space-separated integers: N, K, and S

Lines 2..N+1: Line i+1 describes the number of spots on cow i.

Lines N+2..N+K+1: Line i+N+1 describes pattern-rank slot i.

Output

Line 1: The number of indices, B, at which the pattern matches

Lines 2..B+1: An index (in the range 1..N) of the starting location where the pattern matches.

Sample Input

Sample Output

1
3

Hint

Explanation of the sample:

The sample input corresponds to the example given in the problem statement.

There is only one match, at position 3 within FJ's sequence of N cows.

题意：

给定2个串，求主串中有哪些连续子串可以和模式串匹配上，使得偏序结果相同。

2个长度一样的串的偏序相同的条件是，对于每个位置的2个数，他们的前面小于他们的数一样多，等于他们的数也一样多。

所以还是要用KMP匹配，而且是可覆盖的匹配

代码：

#include<iostream>
#include<vector>
using namespace std;

struct Node {
	int p[100005];
	int num[100005][26]; //num[i][j]是p前i个数中不超过j的数的个数
};
Node nodes, nodet;

class OrderMatch {
......
};

int main()
{
	ios::sync_with_stdio(false);
	int n, k, s;
	cin >> n >> k >> s;
	for (int i = 0; i <= s; i++)nodes.num[0][i] = nodet.num[0][i] = 0;
	for (int i = 1; i <= n; i++)
	{
		for (int j = 0; j <= s; j++)nodes.num[i][j] = nodes.num[i - 1][j];
		cin >> nodes.p[i];
		for (int j = nodes.p[i]; j <= s; j++)nodes.num[i][j]++;
	}
	for (int i = 1; i <= k; i++)
	{
		for (int j = 0; j <= s; j++)nodet.num[i][j] = nodet.num[i - 1][j];
		cin >> nodet.p[i];
		for (int j = nodet.p[i]; j <= s; j++)nodet.num[i][j]++;
	}
	vector<int> ans = OrderMatch(nodes, nodet).match(n,k);
	cout << ans.size() << endl;
	for (int i = 0; i < ans.size(); i++)cout << ans[i] << endl;
	return 0;
}

4，字符串的周期和匹配综合问题

CSU 1800 小X的标题

题目：

Description

小X梦中得到一个神秘字符串，醒来后他赶快记了下来。

这个字符串大概是一篇晦涩难懂的文章，姑且不用去管他的含义，小X想给它起一个标题来表示这份天书。

但是小X是起名困难户，于是他找到了小Y和小Z来一起商量。

小X觉得，标题是这个字符串的开头一段和结尾一段就行，简单、直观；而小Y觉得，一个好的标题应该还在文章中间的某一个地方出现过，除了开头和结尾部分；小Z和小Y的看法一样，不过小Z希望他自己在文章中找到的标题出现的位置和小Y找到的位置不一样。

如果有多个标题符合要求，他们一致同意采用最长的那个。

你知道他们最后采用了哪个吗？

Input

不超过20组数据。

对于每组数据，输入一行一个字符串表示文章内容，字符串只由小写英文字母构成，每个字符串长度不超过500000。

Output

对于每组数据，输出一行一个字符串表示最终选用的标题；如果找不到合适的标题，输出一行一个字符串“Cannot find it”。

Sample Input

aaa
aaaaa
abacadaeafa
fixprefixsuffixsowhatisfix

Sample Output

Cannot find it
aa
a
fix

Hint

aa是aaaaa的前缀以及后缀，另外，把字符串从1到5标号，小Y可以在位置2~3处找到一个aa，而小Z可以在另一个位置3~4处找到一个aa。

代码：

#include<iostream>
#include<stdio.h>
#include<string.h>
using namespace std;

char c[500005];
int nextt[500005];

void getnext(char *t, int m)
{
	int i = 1, j = 0;
	nextt[0] = nextt[1] = 0;
	while (i<m)
	{
		if (j == 0 || t[i] == t[j])nextt[++i] = ++j;
		else j = nextt[j];
	}
}

int main()
{
	while (scanf("%s", c + 1) != EOF)
	{
		int l = strlen(c + 1);
		getnext(c, l);
		int d = nextt[l], sum = 0;
		while (d>0 && c[d] != c[l])d = nextt[d];
		while (d)
		{			
			sum = 0;
			int i = 2, j = 1;
			while (i <= l)
			{
				if (j == 0)i++, j++;
				else if (c[i] == c[j])
				{
					i++, j++;
					if (j > d)
					{
						sum++, i--, j--;
						j = nextt[j];
					}
				}
				else j = nextt[j];
			}			
			if (sum >= 3)break;
			d = nextt[d];
			while (d>0 && c[d] != c[l])d = nextt[d];
		}
		if (sum >= 3)for (int i = 1; i <= d; i++)printf("%c", c[i]);
		else printf("Cannot find it");
		printf("\n");
	}
	return 0;
}

附上自己调试时用的测试用例：

a
aa
aaa
aba
ababab
abababa
abababab
abababcab
ababacbab
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaab
aaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaa
aaaabaaaabaaaabaaaa
aaaabaaaabaaaabaaaabaaaa
aaaabaaaabaaaabaa