Description
The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.
For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.
Input
The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.
Output
For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.
Sample Input
1
4
ACGT
ATGC
CGTT
CAGT
Sample Output
The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.
For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.
Input
The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.
Output
For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.
Sample Input
1
4
ACGT
ATGC
CGTT
CAGT
Sample Output
8
题意:给N个长度不大于5的字符串,让你构造一个字符串使这N个字符串都是他的子序列。输出他的最小长度。
思路:构造的字符串每个位置有4中情况,BFS一下,出现过的状态就剪枝。
#include <iostream>
#include <stdio.h>
#include <cmath>
#include <algorithm>
#include <iomanip>
#include <cstdlib>
#include <string.h>
#include <vector>
#include <queue>
#include <stack>
#include <ctype.h>
using namespace std;
int vis[2000005]; //记录当前状态是否出现过,以便剪枝
char s[10][10]; //记录每个字符串
int len[10]; //记录每个字符串长度
int p[10]; //表示6的i次方
char temp[4]={'A','C','G','T'};
void init() //预处理6的i次方
{
p[0]=1;
for(int i=1;i<10;i++)
{
p[i]=p[i-1]*6;
}
}
struct node
{
int step; //步数,即当前构造的字符串长度
int zt; //状态,指此时给出的字符串的前i个是构造出的字符串的子序列
//一共8个字符串,最多长度为5,则每一位是0-5,六进制数最大6^8。
node() {}
node(int Step, int Zt)
{
step=Step;
zt=Zt;
}
};
int main()
{
init();
int t;
scanf("%d",&t);
while(t--)
{
int n;
scanf("%d",&n);
int res=0; //记录最终要达到的状态
for(int i=1;i<=n;i++)
{
scanf("%s",s[i]+1);
len[i]=(int)strlen(s[i]+1);
res+=len[i]*p[i-1];
}
memset(vis,0,sizeof(vis));
queue<node> q;
q.push(node(0,0));
vis[0]=1;
int ans=0;
while(!q.empty())
{
node k=q.front();
q.pop();
if(k.zt==res)
{
ans=k.step;
break;
}
node l;
for(int i=0;i<4;i++)
{
l.zt=0;
l.step=k.step+1;
int kk=k.zt;
for(int j=1;j<=n;j++)
{
int x=kk%6;
kk/=6;
if(x==len[j]||s[j][x+1]!=temp[i])
{
l.zt+=x*p[j-1];
}
else
{
l.zt+=(x+1)*p[j-1];
}
}
if(vis[l.zt]==0) //剪枝,出现过的状态不再放入队列,否则会MLE
{
q.push(l);
vis[l.zt]=1;
}
}
}
cout<<ans<<endl;
}
return 0;
}