POJ 3294 Life Forms(后缀数组)

Description

You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes like cubes, oil slicks or clouds of dust.

The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant's life forms ended up with a large fragment of common DNA.

Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.

Input

Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case.

Output

For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter, output "?". Leave an empty line between test cases.

Sample Input

3
abcdefg
bcdefgh
cdefghi
3
xxx
yyy
zzz
0

Sample Output

bcdefg
cdefgh

?

 给你n个字符串,然后要你求出超过一半字符串中的最长公共连续字串是什么,如果有多解,按字典序输出.

首先本题依然是连接所有字符串,并且相邻字符串之间用不会出现的不同符号连接.(这里有100个串,如果用char肯定不够用,所以用int表示字符串)

然后对新的字符串求出height数组.现在要二分答案了,判断长为limit的串是否能出现在超过1/2个串中.只需要将height分组,每次取LCP>=limit的一组判断这组内的后缀是不是包括了至少一半的串即可.

AC代码:

#include<cstdio>
#include<cstring>
#include<iostream>
#include<algorithm>
#include<vector>
#include<stdlib.h>
#include<queue>
#include<map>
#include<iomanip>
#include<math.h>
using namespace std;
typedef long long ll;
typedef double ld;

const int maxn=100000+1000;
int nn;//nn是正好超过所有字符串个数一半的整数
int who[maxn];//who[i]表示后缀i属于原始第几个串,如果为0则是人为添加的字符
int ans[maxn];//ans[i]=x表示第i个答案是串s的后缀x
int cnt;//计数答案
int turn=0;
struct SuffixArray
{
    int s[maxn];
    int sa[maxn],rank[maxn],height[maxn];
    int t1[maxn],t2[maxn],c[maxn],n;
    int vis[maxn];//需要初始化为0,vis[i]=3表示后缀i出现在了第3轮
    void build_sa(int m)
    {
        int i,*x=t1,*y=t2;
        for(i=0;i<m;i++) c[i]=0;
        for(i=0;i<n;i++) c[x[i]=s[i]]++;
        for(i=1;i<m;i++) c[i]+=c[i-1];
        for(i=n-1;i>=0;i--) sa[--c[x[i]]]=i;
        for(int k=1;k<=n;k<<=1)
        {
            int p=0;
            for(i=n-k;i<n;i++) y[p++]=i;
            for(i=0;i<n;i++)if(sa[i]>=k) y[p++]=sa[i]-k;
            for(i=0;i<m;i++) c[i]=0;
            for(i=0;i<n;i++) c[x[y[i]]]++;
            for(i=1;i<m;i++) c[i]+=c[i-1];
            for(i=n-1;i>=0;i--) sa[--c[x[y[i]]]]=y[i];
            swap(x,y);
            p=1,x[sa[0]]=0;
            for(i=1;i<n;i++)
                x[sa[i]]=y[sa[i]]==y[sa[i-1]]&&y[sa[i]+k]==y[sa[i-1]+k]?p-1:p++;
            if(p>=n) break;
            m=p;
        }
    }
    void build_height()
    {
        int i,j,k=0;
        for(i=0;i<n;i++) rank[sa[i]]=i;
        for(i=0;i<n;i++)
        {
            if(k)k--;
            j=sa[rank[i]-1];
            while(s[i+k]==s[j+k])k++;
            height[rank[i]]=k;
        }
    }
    bool check(int limit)
    {
        int i,j,k,t;
        int ss,flag=0;
        for(i=2;i<n;i=j+1)
        {
            for(;height[i]<limit&&i<n;i++);//i-1是该组下界
            for(j=i;height[j]>=limit&&j<n;j++);//j-1是该组上界
            if(j-i+1<nn) continue;
            turn++; //该轮数+1,表示现在是第turn轮
            ss=0;//表示该轮小组目前统计到了ss个串的前缀
            for(k=i-1;k<j;k++)
            if( (t=who[sa[k]])!=0 )
            if(vis[t]!=turn){ ss++;vis[t]=turn; }

            if(ss>=nn)
            if(flag) ans[cnt++]=sa[i-1];
            else {cnt=1;ans[0]=sa[i-1];flag=1;}
        }
        return flag;
    }
    void solve()
    {
        memset(vis,0,sizeof(vis));
        int min=1,max=n;
        if(!check(min))
        {
            printf("?\n");
            return ;
        }
        while(min<=max)
        {
            int mid=min+(max-min)/2;
            if(check(mid))min=mid+1;
            else max=mid-1;
        }
        for(int i=0;i<cnt;i++)
        {
            int j=ans[i];
            for(int k=0;k<max;k++)printf("%c",s[j+k]-100);
            printf("\n");
        }
    }
}sa;
int main()
{
    char str[1000+100];
    int n,flag=0;
    while(scanf("%d",&n)==1&&n)
    {
        if(flag==1) printf("\n");
        else flag=1;
        nn=n/2+1;//表示正好>=n/2的那个数
        sa.n=0;
        for(int i=1;i<=n;i++)
        {
            scanf("%s",str);
            int len=strlen(str);
            for(int j=0;j<len;j++)
            {
                sa.s[sa.n+j]=str[j]+100;
                who[sa.n+j]=i;//归属第几个串
            }
            sa.s[sa.n+len]=i;//分割符
            who[sa.n+len]=0;//分割符归属0
            sa.n=sa.n+len+1;//串长增加
        }
        sa.s[sa.n-1]=0;
        sa.build_sa(228);
        sa.build_height();
        sa.solve();
    }
}

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值