传送门:http://acm.split.hdu.edu.cn/showproblem.php?pid=4416
In middle school, teachers used to encourage us to pick up pretty sentences so that we could apply those sentences in our own articles. One of my classmates ZengXiao Xian, wanted to get sentences which are different from that of others, because he thought the distinct pretty sentences might benefit him a lot to get a high score in his article.
Assume that all of the sentences came from some articles. ZengXiao Xian intended to pick from Article A. The number of his classmates is n. The i-th classmate picked from Article Bi. Now ZengXiao Xian wants to know how many different sentences she could pick from Article A which don't belong to either of her classmates?Article. To simplify the problem, ZengXiao Xian wants to know how many different strings, which is the substring of string A, but is not substring of either of string Bi. Of course, you will help him, won't you?
Assume that all of the sentences came from some articles. ZengXiao Xian intended to pick from Article A. The number of his classmates is n. The i-th classmate picked from Article Bi. Now ZengXiao Xian wants to know how many different sentences she could pick from Article A which don't belong to either of her classmates?Article. To simplify the problem, ZengXiao Xian wants to know how many different strings, which is the substring of string A, but is not substring of either of string Bi. Of course, you will help him, won't you?
For each test data
The first line contains an integer meaning the number of classmates.
The second line is the string A;The next n lines,the ith line input string Bi.
The length of the string A does not exceed 100,000 characters , The sum of total length of all strings Bi does not exceed 100,000, and assume all string consist only lowercase characters 'a' to 'z'.
3 2 abab ab ba 1 aaa bbb 2 aaaa aa aaa
Case 1: 3 Case 2: 3 Case 3: 1
题目大意:
有一个字符串属于A串,有n个字符串属于B串,问有多少个子串属于A串但不属于B串。
题解:
关于子串问题容易想到后缀数组。将A串和n个B串通过不同的特殊字符链接在一起,然后求后缀树组和height数组。if(sa[i]<A串的长度)表示这是A串的后缀,否则是B串的后缀。那么现在有两个要求的东西,1)既在A串又在B串的子串个数,在A串重复的个数。对于以i为开头的A的后缀在B串中有多找重复的呢?应该是和i的排名相邻的B串的lcp。正反各求一遍每个属于A串后缀的在B串的个数去最大值,最后在求出A串中重复的,两个相邻的A串后缀的lcp就是了。最后用A的所有子串建去即在A又在B的子串和在A中重复的。注意细节,代码中标出。
#include <iostream>
#include <cstdio>
#include <algorithm>
#include <cmath>
#include <cstring>
#define INF 0x3f3f3f3f
using namespace std;
const int MAXN = 3e5+100;
int wa[MAXN],wb[MAXN],wv[MAXN];
int Ws[MAXN];
int sa[MAXN],Rank[MAXN],height[MAXN];
int cmp(int *r,int a,int b,int l){
return r[a]==r[b] && r[a+l]==r[b+l];
}
void SA(int *r,int n,int m){
int *x = wa,*y=wb;
for(int i = 0;i<m;i++) Ws[i] = 0;
for(int i = 0;i<n;i++)++Ws[x[i]=r[i]];
for(int i = 1;i<m;i++)Ws[i]+=Ws[i-1];
for(int i = n-1;i>=0;i--)sa[--Ws[x[i]]] = i;
int p = 1;
for(int j = 1;p<n;j<<=1,m=p){
p = 0;
for(int i = n-j;i<n;i++)y[p++] = i;
for(int i = 0;i<n;i++)if(sa[i]>=j)y[p++]=sa[i]-j;
for(int i = 0;i<n;i++)wv[i] = x[y[i]];
for(int i = 0;i<m;++i)Ws[i] = 0;
for(int i = 0;i<n;i++)++Ws[wv[i]];
for(int i = 1;i<m;i++)Ws[i] +=Ws[i-1];
for(int i = n-1;i>=0;--i)sa[--Ws[wv[i]]] = y[i];
swap(x,y);
x[sa[0]] = 0;
p = 1;
for(int i =1;i<n;i++){
x[sa[i]] = cmp(y,sa[i-1],sa[i],j)?p-1:p++;
}
}
for(int i = 1;i<n;i++)Rank[sa[i]] = i;
int k = 0;
for(int i = 0;i<n-1;height[Rank[i++]] = k){
if(k)--k;
for(int j =sa[Rank[i]-1];r[i+k]==r[j+k];++k);
}
}
char s[100005];
int r[MAXN],pos[MAXN];
int main(){
int T,w = 0;
scanf("%d",&T);
while(T--){
int n;
int top = 0,z = 30;
scanf("%d",&n);
scanf("%s",s);
int len = strlen(s);
for(int i = 0;i<len;i++){
r[top++] = s[i]-'a'+1;
}
for(int i = 0;i<n;i++){
r[top++] = z++;
scanf("%s",s);
int le = strlen(s);
for(int j = 0;j<le;j++){
r[top++] = s[j]-'a'+1;
}
}
r[top] = 0;
SA(r,top+1,z+10);
int temp = INF;
memset(pos,0,sizeof(pos));
for(int i = 0;i<=top;i++){
if(sa[i]<len){
temp = min(temp,height[i]);
pos[sa[i]] = max(temp,pos[sa[i]]);
}
else temp = INF;
}
temp = INF;
for(int i = top-1;i>=0;i--){
if(sa[i]<len){
temp = min(temp,height[i+1]);
pos[sa[i]] = max(temp,pos[sa[i]]);
}
else temp = INF;
}
for(int i = 1;i<=top;i++){
if(sa[i]<len&&sa[i-1]<len){
pos[sa[i-1]] = max(pos[sa[i-1]],height[i]);
///这里有一个细节,更新pos时必须是sa[i-1]的,而不是sa[i]的
///因为在反向更新pos数组时,pos[sa[i]]可能会大于height[i],如(A:表示A串的)(B:ab)(A:abc)(A:abcabc)(B:abcad)
///而pos[sa[i-1]]是由min(temp,height[i])更新来的,最大是height[i].
}
}
long long ans = (long long)len*(len+1)>>1;
for(int i = 0;i<=top;i++){
ans-=pos[i];
}
printf("Case %d: %lld\n",++w,ans);
}
return 0;
}