Distinct Substrings 给出一个字符串，问它的不重复子串有多少个后缀数组

最新推荐文章于 2024-04-22 22:08:26 发布

kongming_acm

最新推荐文章于 2024-04-22 22:08:26 发布

阅读量762

点赞数

分类专栏： acm_数据结构文章标签：算法 ini string output each input

本文链接：https://blog.csdn.net/kongming_acm/article/details/6232344

版权

acm_数据结构专栏收录该内容

112 篇文章 1 订阅

订阅专栏

694. Distinct Substrings

Problem code: DISUBSTR

Given a string, we need to find the total number of its distinct substrings.

Input

T- number of test cases. T<=20;
Each test case consists of one string, whose length is <= 1000

Output

For each test case output one number saying the number of distinct substrings.

Example

Sample Input: 2 CCCCC ABABA

Sample Output: 5 9

Explanation for the testcase with string ABABA:
len=1 : A,B
len=2 : AB,BA
len=3 : ABA,BAB
len=4 : ABAB,BABA
len=5 : ABABA
Thus, total number of distinct substrings is 9.

#include<iostream>
#include<cstdio>
#include<cstring>
using namespace std;
///后缀数组倍增算法
const int maxn=500000;
char str[maxn];
int wa[maxn],wb[maxn],wv[maxn],wn[maxn],a[maxn],sa[maxn];
int cmp(int*r,int a,int b,int l)
{return r[a]==r[b]&&r[a+l]==r[b+l];}
/**n为字符串长度，m为字符的取值范围，r为字符串。后面的j为每次排
序时子串的长度*/
void DA(int* r,int* sa,int n,int m)
{
    int i,j,p,*x=wa,*y=wb,*t;
    ///对R中长度为1的子串进行基数排序
    for(i=0;i<m;i++)wn[i]=0;
    for(i=0;i<n;i++)wn[x[i]=r[i]]++;
    for(i=1;i<m;i++)wn[i]+=wn[i-1];
    for(i=n-1;i>=0;i--)sa[--wn[x[i]]]=i;

    for(j=1,p=1;p<n;j*=2,m=p)
    {
        /**利用了上一次基数排序的结果，对待排序的子串的第二关键字进行
        了一次高效地基数排序*/
        for(p=0,i=n-j;i<n;i++)y[p++]=i;
        for(i=0;i<n;i++)if(sa[i]>=j)y[p++]=sa[i]-j;
        ///基数排序
        for(i=0;i<n;i++)wv[i]=x[y[i]];
        for(i=0;i<m;i++)wn[i]=0;
        for(i=0;i<n;i++)wn[wv[i]]++;
        for(i=1;i<m;i++)wn[i]+=wn[i-1];
        for(i=n-1;i>=0;i--)sa[--wn[wv[i]]]=y[i];
        ///当p=n的时候，说明所有串都已经排好序了
        ///在第一次排序以后，rank数组中的最大值小于p，所以让m=p
        for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1;i<n;i++)
            x[sa[i]]=cmp(y,sa[i-1],sa[i],j)?p-1:p++;
    }
    return;
}
///后缀数组计算height数组
/**
height数组的值应该是从height[1]开始的，而且height[1]应该是等于0的。
原因是，因为我们在字符串后面添加了一个0号字符，所以它必然是最小的
一个后缀。而字符串中的其他字符都应该是大于0的（前面有提到，使用倍
增算法前需要确保这点），所以排名第二的字符串和0号字符的公共前缀
（即height[1]）应当为0.在调用calheight函数时，要注意height数组的范
围应该是[1..n]。所以调用时应该是calheight(r,sa,n)
而不是calheight(r,sa,n+1)。*/
int rank[maxn],height[maxn];
void calheight(int* r,int* sa,int n)
{
    int i,j,k=0;
    for(i=1;i<=n;i++)rank[sa[i]]=i;
    for(i=0;i<n;height[rank[i++]]=k)
    for(k?k--:0,j=sa[rank[i]-1];r[i+k]==r[j+k];k++);
    return;
}
int main()
{
    int ci;scanf("%d",&ci);
    while(ci--)
    {
        scanf("%s",str);//待处理字符串
        //后缀数组倍增算法使用方法
        /**
        在使用倍增算法前，需要保证r数组的值均大于0。然后要在原字
        符串后添加一个0号字符，具体原因参见罗穗骞的论文。这时候，
        若原串的长度为n，则实际要进行后缀数组构建的r数组的长度应
        该为n+1.所以调用DA函数时，对应的n应为n+1.*/
        int n=strlen(str);
        for(int i=0;i<n;i++) a[i]=(int)str[i];
        a[n]=0;
        DA(a,sa,n+1,256);
        calheight(a,sa,n);
        //....................................
        /**
        题目大意：给出一个字符串，问它的不重复子串有多少个。

        分析：用后缀数组可以轻松解决。因为这个字符串的每个子
        串必然是某个后缀的前缀，先用后缀数组求出sa和height，
        那么对于sa[k]，它有n-sa[k]个子串，其中有height[k]个是
        和上一个后缀重复的，所以要减去。所以用后缀数组求解的
        时间复杂度是O(n)，后缀数组要是用倍增算法是O(nlog2n)，
        效率很高。*/
        int sum=0;
        for(int i=1;i<=n;i++) sum+=n-sa[i]-height[i];
        printf("%d/n",sum);
    }
    return 0;
}

kongming_acm

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Distinct Substrings 给出一个字符串，问它的不重复子串有多少个后缀数组

694. Distinct SubstringsProblem code: DISUBSTR<br /> <br />Given a string, we need to find the total number of its distinct substrings.Input<br />T- number of test cases. T<=20;<br />Each test case consists of one string, whose length is <= 1000Output<br /
复制链接

扫一扫