HDU2328(Corporate Identity)

最新推荐文章于 2020-03-25 20:25:41 发布

Shao_sen

最新推荐文章于 2020-03-25 20:25:41 发布

阅读量210

点赞数

分类专栏： ACM 文章标签： KMP 扩展KMP

本文链接：https://blog.csdn.net/Shaosenmonitor/article/details/103207716

版权

ACM 专栏收录该内容

98 篇文章 0 订阅

订阅专栏

Corporate Identity
Time Limit: 9000/3000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 4328 Accepted Submission(s): 1598

Problem Description
Beside other services, ACM helps companies to clearly state their “corporate identity”, which includes company logo but also other signs, like trademarks. One of such companies is Internet Building Masters (IBM), which has recently asked ACM for a help with their new identity. IBM do not want to change their existing logos and trademarks completely, because their customers are used to the old ones. Therefore, ACM will only change existing trademarks instead of creating new ones.

After several other proposals, it was decided to take all existing trademarks and find the longest common sequence of letters that is contained in all of them. This sequence will be graphically emphasized to form a new logo. Then, the old trademarks may still be used while showing the new identity.

Your task is to find such a sequence.

Input
The input contains several tasks. Each task begins with a line containing a positive integer N, the number of trademarks (2 ≤ N ≤ 4000). The number is followed by N lines, each containing one trademark. Trademarks will be composed only from lowercase letters, the length of each trademark will be at least 1 and at most 200 characters.

After the last trademark, the next task begins. The last task is followed by a line containing zero.

Output
For each task, output a single line containing the longest string contained as a substring in all trademarks. If there are several strings of the same length, print the one that is lexicographically smallest. If there is no such non-empty string, output the words “IDENTITY LOST” instead.

Sample Input
3
aabbaabb
abbababb
bbbbbabb
2
xyz
abc
0

Sample Output
abb
IDENTITY LOST

题解：这道题算是扩展KMP的一道模板题，所以记录一下。

前段时间发现String这一块没看过，一起都是暴力，没有啥好的算法，所以现在重新开始学这一块知识，说实话KMP挺简单的，但是扩展KMP花了我空闲时间看了几天，还是好好记录一下。

如果大家了解KMP算法就知道，KMP算法会有next数组记录着模式串（s2）的前缀长度，这样在匹配的时候待匹配串（s1）就不用像暴力法全部回溯，只回溯没有匹配的，这样匹配效率达到O(strlen(s1)+strlen(s2))。

那么什么是扩展KMP呢，网上是这么解释的，扩展KMP求的是对于待匹配串是（s1）的每一个后缀子串与模式穿（s2）的最长公共前缀。

前面KMP求的是模式串（s2）的前缀长度，而扩展KMP还求的了待待匹配串（s1）的长度。求得的长度我们存放在extend上面。

这里参考的博客（我最容易理解的也是比较详细的）
https://blog.csdn.net/qq_40160605/article/details/80407554

从0-k的计算中，我们已经计算出extend[]和匹配过程中从po开始能匹配到最远位置P（p0+exnted[p0]-1）

第一种情况，当要计算extend[k+1]，原串S1中k+1号位置还未进行匹配，则从原串s1的k+1号位置和模式串s2的0号位置开始进行逐一匹配，知道匹配失败，则extend[k+1]=匹配长度。（简单来说，就是这里没有开始匹配，要从头匹配）

第二种情况，当需要计算的extend[k+1]，原串s1种k+1号位置已经进行匹配，从p0+extend[p0]-1=p中，得知s1[p0,p]=s2[0,p0-p]，左界分别加k+1-p0得s1[k+1,p]=s2[k+1-p0,p-p0]，令len=nextk+1-p0。

1.当k+1+len-1=k+len<p 因为len=next[k+1-p0]，即s2[0,len-1]=s2[k+1-p0,k+p0+len]
所以s1[k+1,k+len]=s2[k+1-p0,k+p0+len]=s2[0,len-1]
所以extend[k+1]=len=next[k+1-p0]

2.当k+1+len-1=k+len>=p 因为s1[p0,p]=s2[0,p-p0] 所以s1[k+1,p]=s2[k+1-p0,p-p0] 又因为len=next[k+1-p0]
所以s2[0,len-1]=s2[k+1-p0,k+len+p0] 所以s1[k+1,p]=s2[0,len-1]
由于大于P的位置我们还未开始匹配，所以一原串s1的p+1位置开始和模式串s2的p-k位置进行逐一匹配，直达匹配失败，并更新相应p0,位置和最远匹配位置p,extend[k+1]=p-k+后来追忆匹配的的长度。

核心代码：
两个算法几乎差不多，都是先处理下标0，1的前缀长度，然后再处理2…n的长度。
KMP:

void kmp(string s){
	int i=0,j,po,len=s.size();
	nxt[0]=len;//下标0的后缀长度 
	while(s[i]==s[i+1]&&i+1<len)//计算下标1的前缀长度 
		i++;
	nxt[1]=i;
	po=1;
	for(int i=2;i<len;i++){//从下标开始计算前缀长度 
		if(nxt[i-po]+i<nxt[po]+po)//第一种情况 
			nxt[i]=nxt[i-po];
		else{//第二种情况 
			j=nxt[po]+po-i;//j为从0开始匹配的下标 
			if(j<0)
				j=0;
			while(j+i<len&&s[j]==s[j+i])
				j++;
			nxt[i]=j;
			po=i;//记录最长前缀的下标 
		}
	}
}

扩展KMP：

bool exkmp(string s1,string s2){
	int i=0,j,po,l1=s1.size(),l2=s2.size();
	kmp(s2);//计算模式串的next 
	while(s1[i]==s2[i]&&i<l1&&i<l2)//匹配s1和s2下标从0开始的长度 
		i++;
	extend[0]=i;
	po=0;
	if(extend[0]==l2)//如果0下标的前缀长度为l2，证明在s1中找到了s2 
		return true;
	for(i=1;i<l1;i++){
		if(nxt[i-po]+i<extend[po]+po)//第一种情况 
			extend[i]=nxt[i-po];
		else{//第二中情况 
			j=extend[po]+po-i;
			if(j<0)
				j=0;
			while(i+j<l1&&j<l2&&s1[i+j]==s2[j])
				j++;
			extend[i]=j;
			po=i;	
		}//else
		if(extend[i]==l2)
			return true;
	}//for
	return false;
}

AC代码：

#include<iostream>
#include<string>
#include<algorithm>
#define maxn 100010
#define INF 0x3f3f3f3f
using namespace std;
int nxt[maxn],extend[maxn];
string s[4100];
string s1;
void kmp(string s){
	int i=0,j,po,len=s.size();
	nxt[0]=len;//下标0的后缀长度 
	while(s[i]==s[i+1]&&i+1<len)//计算下标1的前缀长度 
		i++;
	nxt[1]=i;
	po=1;
	for(int i=2;i<len;i++){//从下标开始计算前缀长度 
		if(nxt[i-po]+i<nxt[po]+po)//第一种情况 
			nxt[i]=nxt[i-po];
		else{//第二种情况 
			j=nxt[po]+po-i;//j为从0开始匹配的下标 
			if(j<0)
				j=0;
			while(j+i<len&&s[j]==s[j+i])
				j++;
			nxt[i]=j;
			po=i;//记录最长前缀的下标 
		}
	}
}
bool exkmp(string s1,string s2){
	int i=0,j,po,l1=s1.size(),l2=s2.size();
	kmp(s2);//计算模式串的next 
	while(s1[i]==s2[i]&&i<l1&&i<l2)//匹配s1和s2下标从0开始的长度 
		i++;
	extend[0]=i;
	po=0;
	if(extend[0]==l2)//如果0下标的前缀长度为l2，证明在s1中找到了s2 
		return true;
	for(i=1;i<l1;i++){
		if(nxt[i-po]+i<extend[po]+po)//第一种情况 
			extend[i]=nxt[i-po];
		else{//第二中情况 
			j=extend[po]+po-i;
			if(j<0)
				j=0;
			while(i+j<l1&&j<l2&&s1[i+j]==s2[j])
				j++;
			extend[i]=j;
			po=i;	
		}//else
		if(extend[i]==l2)
			return true;
	}//for
	return false;
}
string copystr(string s2,int l,int r){
	string s;
	for(int i=l;i<=r;i++){
		s+=s2[i];
	}
	return s;
}
int main(){
	int n;
	while(cin>>n){
		if(n==0)
			break;
		int MIN=0;
		int po=0;
		for(int i=0;i<n;i++){
			cin>>s[i];
			if(s[i].size()<MIN){//找出最小长度的字符串 
				MIN=s[i].size();
				po=i;//记录字符串的下标 
			} 
		}
		int MAX=0;
		int l=s[po].size();
		int left,right;
		for(int i=0;i<l;i++){//枚举不同长度的最短字符串 
			for(int j=i;j<l;j++){
				if(j-i+1<MAX)
					continue;
				s1=copystr(s[po],i,j);//提取最短字符串 
				int flag=true;
				for(int k=0;k<n;k++){//枚举每个字符串，如果其中一个字符串不匹配都结束 
					if(k==po)
						continue;
					if(exkmp(s[k],s1))
						continue;
					flag=false;
					break;
				}
				if(flag){//如果最短字符串可以与其他字符串都匹配 
					if(j-i+1==MAX){//找出字典序最小的 
						for(int k=0;k+left<=right;k++){
							if(s[po][k+left]<s[po][k+i])
								break;
							else{
								if(s[po][k+left]>s[po][k+i]){
									left=i;
									right=j;
									break;
								}
							}
						} 
					}
					if(j-i+1>MAX){//更新最长的长度 
						left=i;
						right=j;
						MAX=max(MAX,j-i+1);
					}
				}
			}
		}
		if(MAX==0)
			cout<<"IDENTITY LOST"<<endl;
		else{
			for(int k=left;k<=right;k++){
				cout<<s[po][k];
			}
			cout<<endl;
		} 
	}
	return 0;
}