POJ—3450 后缀数组(n个串的最长子串)

POJ—3450

Corporate Identity

Time Limit: 3000MS Memory Limit: 65536K
Total Submissions: 9079 Accepted: 3026

Description

Beside other services, ACM helps companies to clearly state their “corporate identity”, which includes company logo but also other signs, like trademarks. One of such companies is Internet Building Masters (IBM), which has recently asked ACM for a help with their new identity. IBM do not want to change their existing logos and trademarks completely, because their customers are used to the old ones. Therefore, ACM will only change existing trademarks instead of creating new ones.

After several other proposals, it was decided to take all existing trademarks and find the longest common sequence of letters that is contained in all of them. This sequence will be graphically emphasized to form a new logo. Then, the old trademarks may still be used while showing the new identity.

Your task is to find such a sequence.

Input

The input contains several tasks. Each task begins with a line containing a positive integer N, the number of trademarks (2 ≤ N ≤ 4000). The number is followed by N lines, each containing one trademark. Trademarks will be composed only from lowercase letters, the length of each trademark will be at least 1 and at most 200 characters.

After the last trademark, the next task begins. The last task is followed by a line containing zero.

Output

For each task, output a single line containing the longest string contained as a substring in all trademarks. If there are several strings of the same length, print the one that is lexicographically smallest. If there is no such non-empty string, output the words “IDENTITY LOST” instead.

Sample Input

3
aabbaabb
abbababb
bbbbbabb
2
xyz
abc
0

Sample Output

abb
IDENTITY LOST

Source

CTU Open 2007

 

题意:求n个串的最长子串

分析:用二分枚举长度,搜索可能的最长子串长度。在check函数中,如果有连续多个height[i] 都大于mid,那么就可以说明这部分的子串至少有mid 长度是相同的。

代码参考:链接

#include<iostream>
#include<stdio.h>
#include<cstring>
#include<algorithm>
using namespace std;
const int maxn=1e6+5;
int Rank[maxn],sa[maxn],height[maxn];
int t1[maxn],t2[maxn],c[maxn];
int s[maxn];
char str[210];
int id[maxn];
char ans[210];
int vis[5000];
bool cmp(int *r,int a,int b,int l)
{
    return r[a]==r[b]&&r[a+l]==r[b+l];
}
void da(int *r,int *sa,int n,int m)
{
    int i,j,p,*x=t1,*y=t2;
    for(i=0;i<m;i++) c[i]=0;
    for(i=0;i<n;i++)c[x[i]=r[i]]++;
    for(i=1;i<m;i++) c[i]+=c[i-1];
    for(i=n-1;i>=0;i--) sa[--c[x[i]]]=i;
    for(j=1;j<=n;j<<=1)
    {
        p=0;
        for(i=n-j;i<n;i++) y[p++]=i;
        for(i=0;i<n;i++) if(sa[i]>=j) y[p++]=sa[i]-j;

        for(i=0;i<m;i++) c[i]=0;
        for(i=0;i<n;i++) c[x[y[i]]]++;
        for(i=1;i<m;i++) c[i]+=c[i-1];
        for(i=n-1;i>=0;i--) sa[--c[x[y[i]]]]=y[i];
        swap(x,y);
        p=1;
        x[sa[0]]=0;
        for(i=1;i<n;i++)
            x[sa[i]]=cmp(y,sa[i-1],sa[i],j)?p-1:p++;
        if(p>=n) break;
        m=p;
    }
    int k=0;
    n--;
    for(i=0;i<=n;i++)
        Rank[sa[i]]=i;
    for(i=0;i<n;i++)
    {
        if(k) k--;
        j=sa[Rank[i]-1];
        while(r[i+k]==r[j+k])
            k++;
        height[Rank[i]]=k;
    }
}
bool check(int mid,int n,int N)
{
    memset(vis,0,sizeof(vis));
    int kase=0;
    for(int i=2;i<=n;i++)
    {
        ///如果有height值不大于mid 就要重新计数
        ///因为height[i]本身的意思就是相邻的最长公共前缀,
        ///如果中间有一个不大于mid,就不能说这n个串有最长公共子串,因为中间断开了,
        ///所以必须得是连续的height[i]大于mid
        if(height[i]<mid)

        {
            kase=0;
            memset(vis,0,sizeof(vis));
            continue;
        }
        if(!vis[id[sa[i]]])///判断第id个串是否已经被标记过
        {
            kase++;
            vis[id[sa[i]]]=1;
        }
        if(!vis[id[sa[i-1]]])
        {
            kase++;
            vis[id[sa[i-1]]]=1;
        }
        if(kase==N)///说明有n个串的公共子串长度为mid了
        {
            for(int j=0;j<mid;j++)
                ans[j]=s[sa[i]+j]+'a'-1;///所以从当前串的下标sa[i]开始往后的mid个字符就是公共子串了
            ans[mid]='\0';
            return true;
        }
    }
    return false;
}
int main()
{
//    freopen("in.txt","r",stdin);
    ios::sync_with_stdio(false);
    int N;
    while(cin>>N&&N)//N为字符串个数
    {
        memset(ans,0,sizeof(ans));
        int n=0;
        int len;
        int tmp=30;
        for(int i=0;i<N;i++)
        {
            cin>>str;
            len=strlen(str);
            for(int j=0;j<len;j++)
            {
                s[n]=str[j]-'a'+1;
                id[n]=i;///id数组用于标记属于第几个字符串
                n++;
            }
            id[n]=tmp;
            s[n]=tmp++;
            n++;
        }
        s[n]=0;
        da(s,sa,n+1,tmp);
        ///下面是二分求最长公共子串,一开始没懂为什么二分,后来明白二分长度就是搜索一个最长的公共子串
        int l=1,r=len,mid;
        int flag=0;
        while(l<=r)
        {
            mid=(l+r)/2;
            if(check(mid,n,N))///判断长度为mid的公共子串是否存在
            {
                l=mid+1;//如果存在继续往大找看有没有更长的
                flag=1;
            }
            else
                r=mid-1;
        }
        if(flag)
            cout<<ans<<endl;
        else
            cout<<"IDENTITY LOST"<<endl;

    }
    return 0;
}

 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值