DNA sequence
Description
The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.
For example, given “ACGT”,”ATGC”,”CGTT” and “CAGT”, you can make a sequence in the following way. It is the shortest but may be not the only one.
Input
The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.
Output
For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.
Sample Input
1
4
ACGT
ATGC
CGTT
CAGT
Sample Output
8
题意:
给你n个DNA序列,问你最短能包含这n个序列的序列长度是多少。(每个序列可以不连续出现)
之前想成枚举不同的出现的顺序,这还迭代个p啊,太暴力了。。。
其实可以枚举所求序列的每个位置,答案就是4^ans次方的,然后用一个pos[i]记录下到当前位置,第i个序列与所构造的序列有多少匹配的,如果所有的序列都完全匹配了,就是一个合法的答案。
2个剪枝:
1.如果剩下的步数加上最长的一个还没有匹配的长度大于我所能接受的答案的话,就return
2.如果枚举的这一位放的字母对我的pos数组没有任何贡献的话,那么这一位肯定不会放它。
#include<cstdio>
#include<cstring>
#include<algorithm>
#include<iostream>
using namespace std;
const int N = 10;
const int M = 1000 + 10;
int t,n;
int num[N];
char s[N][N],ss[M];
bool vis[N];
/*
bool dfs(int x,int len,int limit){
if(len>limit) return false;
if(x==n+1&&len==limit) return true;
if(x==n+1) return false;
for(int i=0;i<n;++i){
if(vis[i]) continue;
vis[i]=true;
int d=0;
for(int j=0;j<len;++j){
if(ss[j]==s[i][d]) ++d;
if(d==num[i]) break;
}
int cnt=len;
for(d;d<num[i];++d) ss[len++]=s[i][d];
if(dfs(x+1,len,limit)) return true;
len=cnt,vis[i]=false;
}
return false;
}*/
inline int Max(int a,int b) {return a>b?a:b;}
inline int Min(int a,int b) {return a<b?a:b;}
char e[4]={'A','G','C','T'};
int pos[N];
int calc(){
int mmax=0;
for(int i=0;i<n;++i) mmax=Max(mmax,num[i]-pos[i]);
return mmax;
}
bool dfs(int len,int limit){
int cnt=calc();
if(cnt+len>limit) return false;
if(!cnt) return true;
int tmp[N];//tmp是在一直改变的,不可设为全局变量!!
for(int i=0;i<4;++i){
memcpy(tmp,pos,sizeof(tmp));
bool flag=0;
for(int j=0;j<n;++j){
if(s[j][pos[j]]==e[i]) ++pos[j],flag=true;
}
if(!flag) continue;
if(dfs(len+1,limit)) return true;
memcpy(pos,tmp,sizeof(pos));
}
return false;
}
#define ms(x,y) memset(x,y,sizeof(x))
void update(){
ms(pos,0);
}
int main(){
scanf("%d",&t);
while(t--){
update();
scanf("%d",&n);int ans=0;
for(int i=0;i<n;++i){
scanf("%s",s[i]),num[i]=strlen(s[i]),ans=max(ans,num[i]);
}
for(;;++ans) {
if(dfs(0,ans)) break;
}
printf("%d\n",ans);
}
return 0;
}