The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.
For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.
For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.
这题的大意是给你N个串,让你找出一个最短的串,使得这n个串都是这个串的子序列(注意是子序列,是可以不连续的)。
因为要求的是最小值,所以在一般dfs里都会加一句剪枝的条件,就是当前的长度 len 如果已经大于 前面找到过的最小值minn,那么就不继续往下搜了,即if(len > minn) return;
IDA*就是在这个的基础上的改进版搜索,把寻找最小值变成判定最小值成不成立,第一次找到成立的值就是要求的最小值。
#include<cstdio>
#include<cstring>
#include<iostream>
#include<algorithm>
using namespace std;
const int inf = 1e9;
int n,minn;
struct str
{
char s[10];
int len;
}a[10];
char g[5] = {'A','T','G','C'};
int pos[10];
int judge()
{
int maxn = 0;
for(int i=0;i<n;i++)
maxn = max(maxn,a[i].len-pos[i]);
return maxn;
}
int dfs(int len)
{
if(len + judge() > minn) //如果当前长度加上最少还需要的长度大于最小值,就不搜了
return 0;
if(judge() == 0)
return 1;
int i,j;
int tmp[10];
for(i=0;i<4;i++)
{
for(j=0;j<n;j++)
tmp[j] = pos[j];
int flag = 0;
for(j=0;j<n;j++)
{
if(a[j].s[pos[j]] == g[i])
{
flag = 1;
pos[j]++;
}
}
if(flag == 1)//如果至少有一个串的下一位和这个字符相等,才往下搜
{
if(dfs(len+1))
return 1;
for(j=0;j<n;j++)
pos[j] = tmp[j];
}
}
return 0;
}
int main(void)
{
int T,i,j;
scanf("%d",&T);
while(T--)
{
scanf("%d",&n);
minn = 0;
for(i=0;i<n;i++)
{
scanf("%s",a[i].s);
a[i].len = strlen(a[i].s);
minn = max(minn,a[i].len);
pos[i] = 0;
}
while(!dfs(0))//不断增加最小值,判定成不成立
{
minn++;
}
printf("%d\n",minn);
}
return 0;
}