hdu4644 BWT

最新推荐文章于 2021-05-23 20:45:51 发布

IBN5100

最新推荐文章于 2021-05-23 20:45:51 发布

阅读量1.3k

点赞数 1

分类专栏： ACM-字符串

本文链接：https://blog.csdn.net/wh2124335/article/details/9816247

版权

ACM-字符串专栏收录该内容

5 篇文章 0 订阅

订阅专栏

BWT

Time Limit: 12000/6000 MS (Java/Others) Memory Limit: 65535/32768 K (Java/Others)
Total Submission(s): 114 Accepted Submission(s): 38

Problem Description

When the problem to match S string in T string is mentioned, people always put KMP, Aho-Corasick and Suffixarray forward. But Mr Liu tells Canoe that there is an algorithm called Burrows–Wheeler Transform(BWT) which is quite amazing and high-efficiency to solve the problem.
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly, we append the ‘$’ to the end of T and for convenience, we still call the new string T. And then for every suffix of T string which starts from i, we append the prefix of T string which ends at (i – 1) to its end. Secondly, we sort these new strings by the dictionary order. And we call the matrix formed by these sorted strings Burrows Wheeler Matrix. Thirdly, we pick characters of the last column to get a new string. And we call the string of the last column BWT(T). You can get more information from the example below.

Then Mr Liu tells Canoe that we only need to save the BWT(T) to solve the matching problem. But how and can it? Mr Liu smiles and says yes. We can find whether S strings like “aac” are substring of T string like “acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT is! But Canoe is puzzled by the tricky method of matching S strings in T string. Would you please help Canoe to find the method of it? Given BWT(T) and S string, can you help Canoe to figure out whether S string is a substring of string T or not?

Input

There are multiple test cases.
First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086).
Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings.
Then n lines comes.
There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line.

Output

For every S, if S string is substring of T string, then put out “YES” in a line. If S string is not a substring of T string, then put out “NO” in a line.

Sample Input

  
  
   
   gc$aaac
2
aac
gc

Sample Output

  
  
   
   YES   
NO

   
   
    
    
     
     Hint
    
    
A naive method will not be accepted.

Source

2013 Multi-University Training Contest 5

Recommend

zhuyuanchen520

解题思路：

关键就是找到BTW的逆过程，这个逆过程是这样子的（直接把多校题解的表贴过来了）：

ADD 1

SORT 1

ADD 2

SORT 2

ac
ca

ADD 3

g$a

caa

$ac

aac
aca

acg

cg$

SORT 3

$ac

aac
aca

acg

caa

cg$

g$a

ADD 4

g$ac

caac
$aca

aacg

acaa

acg$

cg$a

SORT 4

$aca

aacg

acaa

acg$

caac

cg$a

g$ac

ADD 5

g$aca

caacg

$acaa

aacg$

acaac

acg$a

cg$ac

SORT 5

$acaa

aacg$

acaac

acg$a

caacg

cg$ac

g$aca

ADD 6

g$acaa

caacg$

$acaac

aacg$a

acaacg

acg$ac

cg$aca

SORT 6

$acaac

aacg$a

acaacg

acg$ac

caacg$

cg$aca

g$acaa

ADD 7

g$acaac

caacg$a

$acaacg

aacg$ac

acaacg$

acg$aca

cg$acaa

SORT 7

$acaacg

aacg$ac

acaacg$

acg$aca

caacg$a

cg$acaa

g$acaac

详细的介绍可以看这里： https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
这个逆过程还原原串的算法可以O(n)实现的，方法就是这样的：

其实我们只需要关注这个矩阵的第一行就行了，而且每次排序交换的行都是一样的，其实就等同于不断对s排序。也就是我们只需要知道排序后的第一行的最后一个元素是谁就可以了，我们可以追踪它的位置的。细细想一下就会得到如下的方法：

首先对s排序，并记录排序后的元素的原位置。

如：

gc$aaac
1234567

排序后是

$aaaccg

3456271

可以记Tran[]={3,4,5,6,2,7,1}

那么只需简单的从1开始沿着Tran[i]这条路径走就可以还原原串了。

3->5->2->4->6->7->1

$->a->c->a->a->c->g

代码：

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
using namespace std;
char str[2000100];
char t[100100];
pair<int,int>s[100100];
int index[100100];
int n;
int next[100100];
void get_next(char* str,int len){
	next[0]=0;
	for(int i=1;i<len;i++){
		int j=next[i-1];
		while(j&&str[i]!=str[j])j=next[j-1];
		if(str[i]==str[j])j++;
		next[i]=j;
	}
}
bool match(char *s,int len){
	int k=0;
	for(int i=0;i<n;i++){
		while(k&&str[k]!=s[i])k=next[k-1];
		if(str[k]==s[i])k++;
		if(k==len)return true;
	}
	return false;
}
int main(){
	while(~scanf("%s",str)){
		n=strlen(str);
		for(int i=0;i<n;i++){
			s[i].first=str[i];
			s[i].second=i;
		}
		stable_sort(s,s+n);
		for(int i=0;i<n;i++)index[i]=s[i].second;
		int now=index[0];
		n--;
		for(int i=0;i<n;i++){
			t[i]=s[now].first;
			now=index[now];
		}
		t[n]='\0';
		int q;
		scanf("%d",&q);
		while(q--){
			scanf("%s",str);
			int len=strlen(str);
			get_next(str,len);
			if(match(t,len))printf("YES\n");
			else printf("NO\n");
		}
	}
}

IBN5100

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
4
评论
hdu4644 BWT

BWTTime Limit: 12000/6000 MS (Java/Others) Memory Limit: 65535/32768 K (Java/Others)Total Submission(s): 114 Accepted Submission(s): 38Problem DescriptionWhen the problem to matc
复制链接

扫一扫

专栏目录