Ural1713 Key Substrings 后缀数组

最新推荐文章于 2013-06-18 01:32:01 发布

Albafica

最新推荐文章于 2013-06-18 01:32:01 发布

阅读量730

点赞数

分类专栏：后缀数组数据结构

本文链接：https://blog.csdn.net/lenleaves/article/details/8946621

版权

数据结构同时被 2 个专栏收录

47 篇文章 0 订阅

订阅专栏

后缀数组

2 篇文章 0 订阅

订阅专栏

那么多字串，如果暴力的会超时，看了discussion里面说是可用HASH解决的。。没想明白怎么搞，但是想到用后缀数组了。

方法是利用sa数组的特性。我们知道sa将所有后缀按照字典序大小排序，那么对于sa【i】这个后缀来说，与他lcp最大的两个后缀一定就是sa【i+1】，sa【i-1】，然后按照题目的意思要我们为每一个字符串寻找一个子串，这个子串不会出现在其他字符串中。

那么我们首相将所有字符串连城一个字符串，中间用特殊值连接，（注：不能相同）。然后记录每一个字符串开始和结束的位置。然后对于每一个字符串，我们枚举他字串开始的位置，然后通过rank数组找到相应后缀在sa数组中位置，然后分别向前，和向后找第一个不是当前字符串范围内起始的后缀，然后比较lcp取最大值加+1，最后枚举完每一个位置，获得所有最大lcp中的最小值，就是我们要找的替代子串。每一个字符串都用这种方式处理。

另外要注意的是，我们向前找向后找后缀的时候，可能会找到最后一个0值或者我们其他添加进去的特殊值，这时候要特殊处理lcp值，这里需要特殊处理LCP为1,

拎一个注意点是找到前驱后继之后求出了lcp，但是 ss[i][begin+lcp]=='\0',这说明从当前其实位置开始一直到当前字符串的结尾，都在其他字符串中出现，这是后特殊处理lcp为INF。

关于后缀数组的知识详见09年的论文http://wenku.baidu.com/view/228caa45b307e87101f696a8.html很强大的论文，看完习题刷完之后，对于后缀数组的性质，应该算是掌握了。然后就是自己的总结了。

1713. Key Substrings

Time limit: 2.0 second
Memory limit: 64 MB

Although the program committee works as one team, heated debates arise frequently enough. For example, there is no agreement upon which client of the version control system is more convenient to use: a graphic interface program or a console client.

Let us consider some command of a console client. A substring of this command that is not a substring of any other command of this client can be called a key substring because it uniquely identifies the command. In the latest versions of the client, it is not necessary to type the whole command; it is sufficient to type any of its key substrings.

A supporter of the console client wants to convince the program committee to use it. In order to show how fast and convenient the work with this client is, he wants to find a key substring of minimal length for each command. Help him do it.

Input

The first line contains the number n of commands in the console client (2 ≤ n ≤ 1000). Each of the following n lines contains one command of the client. Each command is a nonempty string consisting of lowercase Latin letters and its length is at most 100. No command is a substring of another command.

Output

Output n lines. The i-th line should contain any of the shortest key substrings of the i-th command (the commands are numbered in the order they are given in the input).

Sample

input	output
3 abcm acm bcd	ab ac d

#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>

using namespace std;

#define MAXN 220000
#define INF 0xffffff

int rank[MAXN],sa[MAXN],wa[MAXN],wb[MAXN],high[MAXN],str[MAXN],wss[MAXN],wv[MAXN];
int logs[MAXN],best[20][MAXN];

void calhigh(int *str,int *sa,int n)
{
	int i,j,k=0;
	for(i=1;i<=n;i++) rank[sa[i]]=i;
	for(i=0;i<n;high[rank[i++]]=k)
		for(k?k--:0,j=sa[rank[i]-1];str[i+k]==str[j+k];k++);
}

bool cmp(int *r,int a,int b,int l)
{
	return (r[a]==r[b]&&r[a+l]==r[b+l]);
}

void suffix(int *str,int *sa,int n,int m=180+1000)
{
	int i,j,p,*x =wa,*y=wb;
	for(i=0;i<m;i++) wss[i]=0;
	for(i=0;i<n;i++) wss[x[i]=str[i]]++;
	for(i=1;i<m;i++) wss[i]+=wss[i-1];
	for(i=n-1;i>=0;i--) sa[--wss[x[i]]]=i;
	for(j=1,p=1;p<n;m=p,j*=2)
	{
		p=0;
		for(i=n-j;i<n;i++) y[p++]=i;
		for(i=0;i<n;i++) if(sa[i]>=j) y[p++]=sa[i]-j;
		for(i=0;i<n;i++) wv[i]=x[y[i]];
		for(i=0;i<m;i++) wss[i]=0;
		for(i=0;i<n;i++) wss[wv[i]]++;
		for(i=1;i<m;i++) wss[i]+=wss[i-1];
		for(i=n-1;i>=0;i--) sa[--wss[wv[i]]]=y[i];
		for(swap(x,y),x[sa[0]]=0,i=1,p=1;i<n;i++)
			x[sa[i]]=cmp(y,sa[i-1],sa[i],j)?p-1:p++;
	}
	calhigh(str,sa,n-1);
}

int lcp(int a,int b)
{
	a=rank[a];
	b=rank[b];
	if(a>b) swap(a,b);
	a++;
	int t=logs[b-a+1];
	return min(best[t][a],best[t][b-(1<<t)+1]);
}


void initRmq(int n)
{
	for(int i=1;i<=n;i++) best[0][i]=high[i];
	for(int i=1;i<=logs[n];i++)
	{
		int limit=n-(1<<i)+1;
		for(int j=1;j<=limit;j++)
			best[i][j]=min(best[i-1][j],best[i-1][j+(1<<i>>1)]);
	}
}

void initLog()
{
   logs[0]=-1;
   for(int i=1;i<=MAXN;i++)
        logs[i]=(i&(i-1))?logs[i-1]:logs[i-1]+1;
}

int n;
char ss[1100][110];
char ans[1100][110];
int pos[1100][110];
int main()
{
	initLog();
	while(~scanf("%d",&n))
	{
	    int p=0;
	    for(int i=0;i<n;i++)
	    {
	        scanf("%s",ss[i]);
	        for(int j=0;j<strlen(ss[i]);j++)
            {
                str[p++]=ss[i][j];
                pos[i][j]=p-1;
            }
            str[p++]=150+i;
	    }
	    str[p]=0;
	    suffix(str,sa,p+1);
	    initRmq(p);
	    int l,r;
	    for(int i=0;i<n;i++)
        {
            int maxs,mins;
            char tmp[110];
            int len=strlen(ss[i]);
            l=pos[i][0];
            r=pos[i][len-1];
            mins=len;strcpy(tmp,ss[i]);
            for(int j=l;j<=r;j++)
            {
                int k=rank[j];
                int pre,suc;
                pre=suc=k;
              //  cout<<k<<endl;
                while(sa[pre]<=r&&sa[pre]>=l) pre--;
                pre=sa[pre];
                while(sa[suc]<=r&&sa[suc]>=l) suc++;
                suc=sa[suc];
               // cout<<pre<<" "<<suc<<endl;
                int len1,len2;
                if(str[pre]!=0)
                {
                    len1=lcp(pre,j);
                    if(str[j+len1]>=150)
                        len1=INF;
                    else
                        len1++;
                }
                else
                    len1=1;
                if(str[suc]<150)
                {
                    len2=lcp(suc,j);
                    if(str[j+len2]>=150)
                        len2=INF;
                    else
                        len2++;
                }
                else
                    len2=1;
              //  cout<<len1<<" "<<len2<<endl<<"---------"<<endl;
                maxs=max(len1,len2);
              //  cout<<maxs<<endl;
                if(maxs<mins && maxs!=INF)
                {
                    for(int it=0;it<maxs;it++)
                        tmp[it]=str[it+j];
                    tmp[maxs]='\0';
                    mins=maxs;
                }
               // cout<<tmp<<endl<<"---------"<<endl;
            }
            strcpy(ans[i],tmp);
        }
        for(int i=0;i<n;i++)
            printf("%s\n",ans[i]);
	}
	return 0;


}

Albafica

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Ural1713 Key Substrings 后缀数组

那么多字串，如果暴力的会超时，看了discussion里面说是可用HASH解决的。。没想明白怎么搞，但是想到用后缀数组了。方法是利用sa数组的特性。我们知道sa将所有后缀按照字典序大小排序，那么对于sa【i】这个后缀来说，与他lcp最大的两个后缀一定就是sa【i+1】，sa【i-1】，然后按照题目的意思要我们为每一个字符串寻找一个子串，这个子串不会出现在其他字符串中。那么我们首相将所有字符
复制链接

扫一扫

专栏目录