[CF 316G3]Good Substrings解题报告

最新推荐文章于 2019-11-02 16:25:00 发布

wmdcstdio

最新推荐文章于 2019-11-02 16:25:00 发布

阅读量1.1k

点赞数

文章标签： codeforces 解题报告后缀自动机

本文链接：https://blog.csdn.net/wmdcstdio/article/details/45025787

版权

题目

Smart Beaver recently got interested in a new word game. The point is as follows: count the number of distinct good substrings of some string s. To determine if a string is good or not the game uses rules. Overall there are n rules. Each rule is described by a group of three (p, l, r), where p is a string and l and r (l ≤ r) are integers. We’ll say that string t complies with rule (p, l, r), if the number of occurrences of string t in string p lies between l and r, inclusive. For example, string "ab", complies with rules ("ab", 1, 2) and ("aab", 0, 1), but does not comply with rules ("cd", 1, 2) and ("abab", 0, 1).

A substring s[l... r] (1 ≤ l ≤ r ≤ |s|) of string s = s₁s₂... s_|s| (|s| is a length of s) is string s_ls_l + 1... s_r.

Consider a number of occurrences of string t in string p as a number of pairs of integers l, r (1 ≤ l ≤ r ≤ |p|) such that p[l... r] = t.

We’ll say that string t is good if it complies with all n rules. Smart Beaver asks you to help him to write a program that can calculate the number of distinct good substrings of string s. Two substrings s[x... y] and s[z... w] are cosidered to be distinct iff s[x... y] ≠ s[z... w].

Input

The first line contains string s. The second line contains integer n. Next n lines contain the rules, one per line. Each of these lines contains a string and two integers p_i, l_i, r_i, separated by single spaces (0 ≤ l_i ≤ r_i ≤ |p_i|). It is guaranteed that all the given strings are non-empty and only contain lowercase English letters.

The input limits for scoring 30 points are (subproblem G1):

0 ≤ n ≤ 10.
The length of string s and the maximum length of string p is ≤ 200.

The input limits for scoring 70 points are (subproblems G1+G2):

0 ≤ n ≤ 10.
The length of string s and the maximum length of string p is ≤ 2000.

The input limits for scoring 100 points are (subproblems G1+G2+G3):

0 ≤ n ≤ 10.
The length of string s and the maximum length of string p is ≤ 50000.

Output

Print a single integer — the number of good substrings of string s.

大意就是说：我们给定n个文本串T[1]~T[n]。“好”字符串的定义为：它在第i个文本串中的出现次数在l[i]~r[i]次之间。给一个字符串S，求S有多少个互不相同的好子串。（互不相同是指绝对不同，例如S="aaa"中，"a"只能算一次）。

题解

官方解法用了后缀数组，不过我们可以用后缀自动机解决。

首先将S和T[1]~T[n]顺序接起来（中间放上分隔符）形成一个新的子串SS：

例如，若S="aaab",T[1]="aa",T[2]="aab"，那么SS="aaab#aa$ab%"（当然实际操作中分隔符不是这样的。有一个小坑：分隔符不能从'z'+1向后顺延，因为这会超出127的范围）。

然后将字符串SS建成后缀自动机MT。

我们可以对MT的每个节点v维护sz[0]~sz[n]，其中sz[i]表示v的终点集合中有几个在T[i]中（i=0则为S），如果从初态t0到v的路径上没有分隔符，那sz[i]也就表示节点v所对应的串（们）在T[i]中出现了几次。后缀自动机每次添加非拷贝节点时，其sz数组中只有一个1（取决于当前加的字符在哪个串里）。在建成后缀自动机后，将所有节点按照len排序，然后倒着将每个节点的sz数组加到其后缀链接指向节点的sz数组上。这是后缀自动机中常用的一种DP方式。

有了sz数组后，我们就可以判断每个节点是否对应着“好”字符串。（由于同一节点所对应的字符串们的终点集合相同，所以它们要么都好要么都不好）。

感性地想，我们最后应该是把串S拿到自动机MT上匹配，每一步在到达的节点v处都加上一个什么值。什么值呢？其实是两个：①若v是好节点，加上当前匹配v中串的数目（显然，有可能并未匹配v对应的所有字符串，可能只匹配了长度较短的一部分）。②加上v沿后缀链接能走到的所有好节点的对应字符串数量之和（我们称这个值为suf_sz）。

①便于计算，②可以DP出来，但是！有一个问题：题目让求的是所有不同子串的个数（没错，即使位置不同，只要串一样就算相同）。

怎么办呢？

对自动机中的每个节点记一个值last，含义是它最后一次在S中出现的位置（last=-1代表未在S中出现过）。last值可以像sz数组一样DP出来。最后在MT上跑匹配时，仅当第i次到达的节点v满足v.last=i时，才把v节点的①和②加到答案上。这样实际上就保证了相同的串只在最后一次才会计算。

最后就如前所说，在MT上跑串S的匹配，第i次循环到达节点v时，若v.last=i，则将v节点的①和②加到答案上。

重点关注一下DP，一共有三个：

1.sz数组。初始化为全零。在向MT中添加T[k]中的某个值时，新建节点cur的sz[k]=1.在MT构建完毕后，按照len值倒序遍历所有节点，把它的sz值加到其后缀链接指向的节点的sz上。

2.last值。初始化为-1.在向MT中添加SS[i]时，新建节点cur的last=i。MT构建完毕后，按照len值倒序遍历所有节点，用它的last值更新其后缀链接指向节点的last值。

3.suf_sz。在MT构建完毕后，按照len值顺序遍历所有节点v，将v的后缀链接指向节点的suf_sz加到v的suf_sz上，若后缀链接指向好节点，将其对应的字符串数量加到v的suf_sz上。

代码

#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
using namespace std;
#define Nil NULL
typedef long long LL;
const int INF=0x7ffffff/2;
const int SIZEP=50010;
const int LNUM=11;
int M;//限制的数量
int lim_l[LNUM],lim_r[LNUM];
int match[256]={0};
class Suffix_Automaton{
public:
	class Node{
	public:
		int len;
		Node *suflink;//后缀链接
		Node *ch[26+LNUM];//转移
		int sz[LNUM];//在每个模式串中出现次数
		bool good;//是否对应"好"字符串
		int range;//对应多少个字符串
		int last;//首次出现位置
		LL suf_sz;//后缀链接能到达好节点的range之和
		void calc_range(void){
			range=len;
			if(suflink!=Nil) range-=suflink->len;
		}
		void calc_good(void){
			for(int i=1;i<=M;i++){
				if(!(lim_l[i]<=sz[i]&&sz[i]<=lim_r[i])){
					good=false;
					return;
				}
			}
			good=true;
		}
		void clear(void){
			len=0;
			suflink=Nil;
			memset(ch,0,sizeof(ch));
			memset(sz,0,sizeof(sz));
			good=false;
			range=0;
			last=-1;
			suf_sz=0;
		}
		Node(){clear();}
	};
	Node *root,*last;
	int n;
	Node *lis[1100050];
	int timer;
	int findpos(Node *a){
		if(a==Nil) return -1;
		for(int i=0;i<n;i++){
			if(lis[i]==a) return i;
		}
		return -2;
	}
	void print(Node *a){
		cout<<"addr: "<<findpos(a)<<endl;
		cout<<"len: "<<a->len<<endl;
		cout<<"suflink: "<<findpos(a->suflink)<<endl;
		cout<<"good: "<<a->good<<endl;
		cout<<"last: "<<a->last<<endl;
		cout<<"suf_sz: "<<a->suf_sz<<endl;
		cout<<"sz: ";for(int i=0;i<=M;i++){cout<<a->sz[i]<<" ";}cout<<endl;
		for(int i=0;i<26+LNUM;i++){if(a->ch[i]!=Nil){cout<<(char)('a'+i)<<" : "<<findpos(a->ch

[i])<<endl;}}
		cout<<endl;
	}
	void clear(void){
		root=new Node;
		last=root;
		lis[0]=root;
		n=1;
		timer=0;
	}
	Suffix_Automaton(void){clear();}
	void insert(char c,int id){
		int k=match[c];
		Node *cur=new Node;
		if(id!=-1) cur->sz[id]=1;
		lis[n++]=cur;
		cur->len=last->len+1;
		if(id==0) cur->last=timer++;
		Node *p=last;
		while(p!=Nil&&p->ch[k]==Nil){
			p->ch[k]=cur;
			p=p->suflink;
		}
		if(p==Nil) cur->suflink=root;
		else{
			Node *q=p->ch[k];
			if(p->len+1==q->len) cur->suflink=q;
			else{
				Node *clone=new Node;
				lis[n++]=clone;
				*clone=*q;
				memset(clone->sz,0,sizeof(clone->sz));
				clone->len=p->len+1;
				cur->suflink=clone;
				q->suflink=clone;
				while(p!=Nil&&p->ch[k]==q){
					p->ch[k]=clone;
					p=p->suflink;
				}
			}
		}
		last=cur;
	}
	class cmp_len{
	public:
		bool operator () (Node *a,Node *b){
			return a->len<b->len;
		}
	};
	void prepare(void){
		sort(lis,lis+n,cmp_len());
		for(int i=n-1;i>=0;i--){
			Node *p=lis[i];
			if(p->suflink!=Nil){
				for(int j=0;j<=M;j++) p->suflink->sz[j]+=p->sz[j];
				p->suflink->last=max(p->suflink->last,p->last);
			}
		}
		for(int i=0;i<n;i++) lis[i]->calc_good(),lis[i]->calc_range();
		for(int i=0;i<n;i++){
			Node *p=lis[i];
			if(p->suflink!=Nil&&p->last==p->suflink->last){
				Node *q=p->suflink;
				p->suf_sz+=q->suf_sz;
				if(q->good){
					p->suf_sz+=q->range;
				}
			}
		}
	}
	LL calc(char *s,int L){
		LL ans=0;
		Node *p=root;
		int now=0;
		for(int i=0;i<L;i++){
			int k=match[s[i]];
			while(p!=root&&p->ch[k]==Nil){
				p=p->suflink;
				now=p->len;
			}
			if(p->ch[k]!=Nil) p=p->ch[k],now++;
			if(p->last==i){//保证不重
				ans+=p->suf_sz;
				if(p->good){
					ans+=now;
					if(p->suflink!=Nil) ans-=p->suflink->len;
				}
			}
		}
		return ans;
	}
};
Suffix_Automaton MT;
char S[SIZEP];
int L;
char str[SIZEP];
void build_MT(void){
	memset(match,-1,sizeof(match));
	for(int i=0;i<26;i++) match['a'+i]=i;
	for(int i=0;i<20;i++) match[i]=26+i;
	scanf("%s",S);
	scanf("%d",&M);
	char dlm='\0';
	L=strlen(S);
	for(int i=0;i<L;i++){
		MT.insert(S[i],0);
	}
	MT.insert(dlm++,-1);
	for(int i=1;i<=M;i++){
		scanf("%s",str);
		scanf("%d%d",&lim_l[i],&lim_r[i]);
		int len=strlen(str);
		for(int k=0;k<len;k++){
			MT.insert(str[k],i);
		}
		MT.insert(dlm++,-1);//分隔符
	}
}
int main(){
	//freopen("t1.in","r",stdin);
	build_MT();
	MT.prepare();
	cout<<MT.calc(S,L)<<endl;
	return 0;
}