DNA sequence
Time Limit: 15000/5000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 5220 Accepted Submission(s): 2447
Problem Description
The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.
For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.
Input
The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.
Output
For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.
Sample Input
1
4
ACGT
ATGC
CGTT
CAGT
Sample Output
8
Author
LL
IDA*算法是A*算法和迭代加深算法的结合。来自blog
迭代加深算法是在dfs搜索算法的基础上逐步加深搜索的深度,它避免了广度优先搜索占用搜索空间太大的缺点,也减少了深度优先搜索的盲目性。它主要是在递归搜索函数的开头判断当前搜索的深度是否大于预定义的最大搜索深度,如果大于,就退出这一层的搜索,如果不大于,就继续进行搜索。这样最终获得的解必然是最优解。
此题的题意是:给出N个DNA串,求包含这N个串的最短序列的长度。
方法:使用IDA*的方法限制序列的长度。这里有两处剪枝:1、当前搜的深度>限制的深度,剪!2、当前搜的深度+预计还要匹配的字符串的最大长度>限制的深度,剪!
#include <bits/stdc++.h>
int t, n;
int ans; //最后答案
int dep;//搜索深度
char str[10][10];
char DNA[4] = {'A','T','C','G'};
using namespace std;
//len数组记录的是第n个字符串所匹配到第几个字符
/*
比如我们深搜到某一步得到的串是:ACGTCA
对于给定的四个字符串
ACGT 匹配到了第四个字符,所以len[0]=4 下标从零开始
ATGC 匹配到了第二个字符,所以len[1]=2
CGTT 匹配到了第三个字符,所以len[2]=3
CAGT 匹配到了第二个字符,所以len[3]=2
所以当所有len[i]都等于第i个字符串长度的时候,匹配才完成
https://blog.csdn.net/qq_37405320/article/details/82940066
*/
void dfs (int index, int len[]){
if (index > dep) return ; //大于限制深度,剪枝
int maxx = 0; //预计还要匹配的字符串最大长度
for (int i = 0; i < n; i++){
maxx = max(int(strlen(str[i])-len[i]),maxx);
}
//条件满足
if (maxx == 0){
ans = index;
return ;
}
//该限定深度搜不到,剪枝
if (index + maxx > dep) {
return ;
}
for (int i = 0; i < 4; i++){
int flag = 0;
int pos[10];
//匹配到了至少一个就能往下搜索
for (int j = 0;j < n; j++){
//匹配到的len+1
if(str[j][len[j]] == DNA[i])
{
flag = 1;
pos[j] = len[j] + 1;
}else pos[j] = len[j]; //没有匹配到的len不变
}
if(flag) dfs(index+1, pos);
if(ans != -1) return;
}
}
int main(){
ios::sync_with_stdio(false);
cin >> t;
while (t--) {
cin >> n;
dep = 0;
for (int i = 0; i < n; i++){
cin >> str[i];
dep = max (dep, int(strlen(str[i])));
}
ans = -1;
int pos[10] = {0};
while(1) {
dfs(0,pos);
if(ans != -1){
break;
}
dep++;
}
cout << ans << endl;
}
return 0;
}