后缀自动机+循环同构[Cyclical Quest]

Little_Match_Boy

于 2021-08-18 17:12:46 发布

阅读量146

点赞数

分类专栏： ACM 字符串（AC自动机&KMP&字典树&后缀数组&后缀自动机）

本文链接：https://blog.csdn.net/Little_Match_Boy/article/details/119783182

版权

ACM 同时被 2 个专栏收录

20 篇文章 0 订阅

订阅专栏

字符串（AC自动机&KMP&字典树&后缀数组&后缀自动机）

7 篇文章 0 订阅

订阅专栏

Cyclical Quest

题目描述

Some days ago, WJMZBMR learned how to answer the query "how many times does a string xx occur in a string ss " quickly by preprocessing the string ss . But now he wants to make it harder.

So he wants to ask "how many consecutive substrings of ss are cyclical isomorphic to a given string xx ". You are given string ss and nn strings x_{i}xi , for each string x_{i}xi find, how many consecutive substrings of ss are cyclical isomorphic to x_{i}xi .

Two strings are called cyclical isomorphic if one can rotate one string to get the other one. 'Rotate' here means 'to take some consecutive chars (maybe none) from the beginning of a string and put them back at the end of the string in the same order'. For example, string "abcde" can be rotated to string "deabc". We can take characters "abc" from the beginning and put them at the end of "de".

输入格式

The first line contains a non-empty string ss . The length of string ss is not greater than 10^{6}106 characters.

The second line contains an integer nn ( 1<=n<=10^{5}1<=n<=105 ) — the number of queries. Then nn lines follow: the ii -th line contains the string x_{i}xi — the string for the ii -th query. The total length of x_{i}xi is less than or equal to 10^{6}106 characters.

In this problem, strings only consist of lowercase English letters.

输出格式

For each query x_{i}xi print a single integer that shows how many consecutive substrings of ss are cyclical isomorphic to x_{i}xi . Print the answers to the queries in the order they are given in the input.

题意翻译

给定一个主串SS和nn个询问串，求每个询问串的所有循环同构在主串中出现的次数总和。

输入输出样例

输入 #1

baabaabaaa
5
a
ba
baa
aabaa
aaba

输出 #1

输入 #2

aabbaa
3
aa
aabb
abba

输出 #2

2
3
3

source:CF235C

题意

给定一个主串SS和nn个询问串，求每个询问串的所有循环同构在主串中出现的次数总和。

思路：

对于循环同构问题，先考虑暴力方法，假设word长度为x，那么有x个同构串每个串的长度都为x，每个串都跑一遍,O(x^2),一共q个询问，加上预处理自动机的O（textlen）,textlen和wordlen一个数量级，那么总共就是O(q*x^2)铁定TLE，然后说一下我的方法（其实是看的大佬的因为我还没学过循环同构），大家可能听说过这样一句话：在SAM上沿着边走相当于往字符串后面加字符，在link tree上沿着边走相当于往字符串前面加字符”，那么我的思路是，对于每一个word,复制这个word并连接在该word的后面，形成word+word，记这个长度为2*wordlen的串为W，那么这个W的每一个长度为wordlen的子串就是原本word的所有循环同构串，妙啊。然后就很方便了，先根据text建立后缀自动机，那么问题转化为,能匹配多少W的子串长度为lenword就是W串在SAM上跑,看看对于W每个前缀[1,i]是否最大匹配长度大于等于wordlen设匹配[1,i]跑到了SAM上的p点,匹配长度是l，若当前匹配长度 l 大于等于wordlen就是匹配成功。然后我们知道size里面存的就是该endpose类在text串的位置个数，那么对答案的贡献是多少呢??是加上siz[p]吗??

当然不是,为了方便，令t=wordlen，siz[p]是[i-l+1,i]的出现次数,我们需要求的是[i−∣t∣+1,i]的。那就简单了,我们知道link树上的祖先是孩子的后缀,我们一直让p往上link跳直到p的父亲的maxleng小于|t|此时的pp点的集合中必然包括[i−∣t∣+1,i]这个子串,加上siz[p]即可，注意，循环同构可能相等啊...打个标记即可,每个节点只算一次贡献，（比如abab,把前面的ab放在后面这样的循环同构就会相等但是只能算一次）

然后就是代码

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include<iostream>
#include<algorithm>
using namespace std;
typedef long long ll;
const int maxt=1e6+10;
const int maxn=2e6+10;
int sam[maxn][26],link[maxn],visit[maxn],siz[maxn],len[maxn],temp[maxn],sa[maxn],last=1,cur,nodecnt=1;
char text[maxt],word[2*maxt];
void insert(int c)
{
	int k=last;
	cur=++nodecnt;
	last=cur;
	len[cur]=len[k]+1;
	siz[cur]=1;
	while(k&&!sam[k][c])
	{
		sam[k][c]=cur;
		k=link[k];
	}    
	if(!k)
	    link[cur]=1;
	else if(len[sam[k][c]]==len[k]+1)
	    link[cur]=sam[k][c];
	else
	{
		int x=sam[k][c],clone=++nodecnt;
		len[clone]=len[k]+1;
		link[clone]=link[x];
		memcpy(sam[clone],sam[x],sizeof(sam[clone]));
		link[x]=link[cur]=clone;
		while(k&&sam[k][c]==x)
		{
			sam[k][c]=clone;
			k=link[k];
		}
	}
}
int main()
{
	scanf("%s",text);
    int textlen;
    textlen=strlen(text);
    for(int i=0;i<textlen;i++)
        insert(text[i]-'a');//建立SAM 
    for(int i=1;i<=nodecnt;i++)//还是用桶排序算size 
        temp[len[i]]++;
    for(int i=1;i<=nodecnt;i++)
        temp[i]+=temp[i-1];
    for(int i=nodecnt;i>=1;i--)
        sa[temp[len[i]]--]=i;
    for(int i=nodecnt;i>=1;i--)
        siz[link[sa[i]]]+=siz[sa[i]];
    int n;
	scanf("%d",&n);
	for(int j=1;j<=n;j++)
	{
	    int n,ans=0,p=1,leng=0,c;//注意这里每次要初始化，leng代表当前匹配上的长度 
		scanf("%s",word);
		int wordlen;
		wordlen=strlen(word);
		for(int i=0;i<wordlen;i++)//这就是复制一份word拼接到原来的后面 
			word[i+wordlen]=word[i];
		for(int i=0;i<2*wordlen;i++)
		{
			c=word[i]-'a';
			if(sam[p][c])//当前字符直接匹配上了 
			{
				leng++;
				p=sam[p][c];
			}
			else//如果当前字符没有直接匹配上 
			{
				while(p&&!sam[p][c])//我们知道沿着link回溯，该节点的link祖先都是该节点maxnlen串的后缀，也就是把maxlen不断去掉开头的字符 
					p=link[p];//所以我们希望能扩大范围（虽然leng的长度会变短）看看能不能匹配上当前字符c 
				if(!p)//如果一直到根也没有能匹配上的（包括根也没匹配上） 
				    p=1,leng=0;//那就要重新开始，记得p=1，因为当前p=0 
				else//如果在不断的去掉开头字符的过程中匹配上了，就更新一下leng然后p走到匹配上的节点 
				{
					leng=len[p]+1;//这里不能把两句互换然后leng等于len[p],因为当p=sam[p][c]之后，len[sam[p][c]]可能并不是len[p]+1 
					p=sam[p][c];//比如说当前已经匹配到abcd,要匹配的字符是e，沿着link走到了cd,发现可以匹配到e,但是text串里面每次cde出现的时候都是以fgkcde出现的，那么fgkcde和cde就在同一个endpose类，导致len[p]=6而不是3 
				}
			}
			if(leng>=wordlen)
			{
				while(len[link[p]]>=wordlen)
				    p=link[p];
				if(visit[p]!=j)
				{
					ans+=siz[p];//对于打标是考虑到循环同构重复的情况，如果没有重复就可以不打标 
					visit[p]=j;	//注意这里的打标方式不能用0，1打标，因为对于q个询问互不干扰，之贡献一次的要求只在一个询问里制约ans 
				}
				leng=wordlen;//记得更新 
			}
		}
		printf("%d\n",ans);
	}
    return 0;
}

2021.8.18

Little_Match_Boy

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
后缀自动机+循环同构[Cyclical Quest]

Cyclical Quest题目描述Some days ago, WJMZBMR learned how to answer the query "how many times does a stringxxoccur in a stringss" quickly by preprocessing the stringss. But now he wants to make it harder.So he wants to ask "how many consecutive subs...
复制链接

扫一扫