hdu4644 BWT

BWT

Time Limit: 12000/6000 MS (Java/Others)    Memory Limit: 65535/32768 K (Java/Others)
Total Submission(s): 114    Accepted Submission(s): 38


Problem Description
When the problem to match S string in T string is mentioned, people always put KMP, Aho-Corasick and Suffixarray forward. But Mr Liu tells Canoe that there is an algorithm called Burrows–Wheeler Transform(BWT) which is quite amazing and high-efficiency to solve the problem. 
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly, we append the ‘$’ to the end of T and for convenience, we still call the new string T. And then for every suffix of T string which starts from i, we append the prefix of T string which ends at (i – 1) to its end. Secondly, we sort these new strings by the dictionary order. And we call the matrix formed by these sorted strings Burrows Wheeler Matrix. Thirdly, we pick characters of the last column to get a new string. And we call the string of the last column BWT(T). You can get more information from the example below.



Then Mr Liu tells Canoe that we only need to save the BWT(T) to solve the matching problem. But how and can it? Mr Liu smiles and says yes. We can find whether S strings like “aac” are substring of T string like “acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT is! But Canoe is puzzled by the tricky method of matching S strings in T string. Would you please help Canoe to find the method of it? Given BWT(T) and S string, can you help Canoe to figure out whether S string is a substring of string T or not?
 

Input
There are multiple test cases.
First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086). 
Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings. 
Then n lines comes. 
There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line.
 

Output
For every S, if S string is substring of T string, then put out “YES” in a line. If S string is not a substring of T string, then put out “NO” in a line.
 

Sample Input
  
  
gc$aaac 2 aac gc
 

Sample Output
  
  
YES NO
Hint
A naive method will not be accepted.
 

Source
 

Recommend
zhuyuanchen520

解题思路:

关键就是找到BTW的逆过程,这个逆过程是这样子的(直接把多校题解的表贴过来了):

ADD  1

g

c

$

a

a

a

c

SORT 1

$

a

a

a

c

c

g

ADD 2

g$

ca

$a

aa

ac

ac

cg

SORT 2

$a

aa

ac

ac
ca

cg

g$

ADD 3

g$a

caa

$ac

aac
aca

acg

cg$

SORT 3

$ac

aac
aca

acg

caa

cg$

g$a

ADD 4

g$ac

caac
$aca

aacg

acaa

acg$

cg$a

SORT 4

$aca

aacg

acaa

acg$

caac

cg$a

g$ac

ADD 5

g$aca

caacg

$acaa

aacg$

acaac

acg$a

cg$ac

SORT 5

$acaa

aacg$

acaac

acg$a

caacg

cg$ac

g$aca

ADD 6

g$acaa

caacg$

$acaac

aacg$a

acaacg

acg$ac

cg$aca

SORT 6

$acaac

aacg$a

acaacg

acg$ac

caacg$

cg$aca

g$acaa

ADD 7

g$acaac

caacg$a

$acaacg

aacg$ac

acaacg$

acg$aca

cg$acaa

SORT 7

$acaacg

aacg$ac

acaacg$

acg$aca

caacg$a

cg$acaa

g$acaac

 

 

详细的介绍可以看这里: https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
这个逆过程还原原串的算法可以O(n)实现的,方法就是这样的:
其实我们只需要关注这个矩阵的第一行就行了,而且每次排序交换的行都是一样的,其实就等同于不断对s排序。也就是我们只需要知道排序后的第一行的最后一个元素是谁就可以了,我们可以追踪它的位置的。细细想一下就会得到如下的方法:
首先对s排序,并记录排序后的元素的原位置。
如:
gc$aaac
1234567
排序后是
$aaaccg
3456271
可以记Tran[]={3,4,5,6,2,7,1}
那么只需简单的从1开始沿着Tran[i]这条路径走就可以还原原串了。
3->5->2->4->6->7->1
$->a->c->a->a->c->g

代码:

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
using namespace std;
char str[2000100];
char t[100100];
pair<int,int>s[100100];
int index[100100];
int n;
int next[100100];
void get_next(char* str,int len){
	next[0]=0;
	for(int i=1;i<len;i++){
		int j=next[i-1];
		while(j&&str[i]!=str[j])j=next[j-1];
		if(str[i]==str[j])j++;
		next[i]=j;
	}
}
bool match(char *s,int len){
	int k=0;
	for(int i=0;i<n;i++){
		while(k&&str[k]!=s[i])k=next[k-1];
		if(str[k]==s[i])k++;
		if(k==len)return true;
	}
	return false;
}
int main(){
	while(~scanf("%s",str)){
		n=strlen(str);
		for(int i=0;i<n;i++){
			s[i].first=str[i];
			s[i].second=i;
		}
		stable_sort(s,s+n);
		for(int i=0;i<n;i++)index[i]=s[i].second;
		int now=index[0];
		n--;
		for(int i=0;i<n;i++){
			t[i]=s[now].first;
			now=index[now];
		}
		t[n]='\0';
		int q;
		scanf("%d",&q);
		while(q--){
			scanf("%s",str);
			int len=strlen(str);
			get_next(str,len);
			if(match(t,len))printf("YES\n");
			else printf("NO\n");
		}
	}
}



  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值