IDA*算法

最新推荐文章于 2023-04-17 21:29:28 发布

Liang-梁

最新推荐文章于 2023-04-17 21:29:28 发布

阅读量2k

点赞数 4

分类专栏： IDA*算法文章标签： IDA* 迭代加深搜索高级搜索算法 A*算法启发式搜索

本文链接：https://blog.csdn.net/qq_37555704/article/details/80049418

版权

IDA*算法专栏收录该内容

2 篇文章 0 订阅

订阅专栏

IDA*
- IDA*简单介绍
- 题目运用
  - DNA sequence(基因序列)
Thanks for Reading!!!

这里写图片描述

IDA*

IDA*简单介绍

IDA*算法就是基于迭代加深的A*算法,也就是说IDA*=IDA+A*。
那他相对于IDA和A*有什么优点呢,个人就在这里简单谈谈：
与A*差异点：
A*需要大量的计算估值函数来确定优先级，还要使用优先队列和判重、排序等操作,对于方案一类问题不易保存；而IDA*由于是DFS过程，是使用估值函数来剪枝的，方案也更容易保存空间需求减少.
与IND差异点
IDA*算法其实只是在IDA的基础上加上了A*来进行剪枝优化。

题目运用

DNA sequence(基因序列)

The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.

For example, given “ACGT”,”ATGC”,”CGTT” and “CAGT”, you can make a sequence in the following way. It is the shortest but may be not the only one.
样例图示
Input
The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.
Output
For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.
Sample Input
1
4
ACGT
ATGC
CGTT
CAGT
Sample Output
8

题目大意

多组测试数据,每次输入一个n(1<=n<=8)表示DNA序列条数,接下来n行每行一条DNA(有序且只由’A”C”G”T’组成,1<=每条DNA长度<=5)。求一包含所有输入DNA序列(可以中间断开但必须有序)的最短有序基因序列的长度?

分析

或许你会想到许多关于字符串匹配的问题，
然而，它是一道迭代加深搜索(+-+)
首先,由于这道题最终的答案无法直接进行搜索，所以进行迭代加深搜索,给它拟定一个深度,也就是最大答案长度,同时加上启发式函数剪枝.
每次搜索具体怎么搜?我们就来看看:
我们将对于每个串对于当前搜索出的串匹配到了位置 $pos-1$ ，将要匹配 $pos$ ，则我们每一次操作进入下一层都要在所有 $str[pos[i]]$ 中有的字符中选取，否则就是无效操作,例如样例： $str[1]="ACGT"$
$str[2]="ATGC"$
$str[3]="CGTT"$
$str[4]="CAGT"$
当我们尝试第一个位置时我们只能从每个串开头选’A’或’C’，若选了’A’,则 $pos[1]$ ++, $pos[2]$ ++,再下次操作就能从’C”T’中选择,若选了’C’,则 $pos[3]$ ++, $pos[4]$ ++,再下次操作就能从’A”G’中选择.
如果我们每次都只到 $maxd$ 才来进行判断,这显然是要TLE的,所以我们必须进行启发式剪枝:
方案一:从剩余串中选最长的长度h,若h加上当前深度大于最大深度，即 $h+d>maxd$ 时就返回。
这种方法十分好理解,但却不是最优的。
方案二:针对4个字符从剩余串中找对于单个字符包含最多的数量相加作为h,若h加上当前深度大于最大深度，即 $h+d>maxd$ 时就返回。
例如样例，若开头选了’A’,则h=1+1+1+2=5
这种方法明显优于方案一。

代码

#include<set>  
#include<map> 
#include<stack>
#include<cmath>
#include<cstdio>
#include<queue>
#include<vector>
#include<climits>
#include<cstring>
#include<iostream>  
#include<algorithm>  
using namespace std;  
int read(){  
    int f=1,x=0;  
    char c=getchar();  
    while(c<'0'||'9'<c){if(c=='-')f=-1;c=getchar();}  
    while('0'<=c&&c<='9') x=x*10+c-'0',c=getchar();  
    return f*x;  
} 
#define eps 1
#define MAXN 1000
#define INF 0x3f3f3f3f
int n,maxd,Map[36];
struct node{
    int len,pos;//长度、匹配位置
    char str[6];//字符串
}st[10];
int h(){
    int ret=0,Max;
    for(int j=0;j<4;j++,Max=0){//枚举四种不同字符'A''C''G''T'
        for(int i=0;i<n;i++){//单个字符在当前剩余串的个数
            int tmp=0;
            for(int k=st[i].pos;k<st[i].len;k++)
                if(Map[st[i].str[k]-'A']==j)tmp++;
            Max=max(Max,tmp);//所有单个字符在当前剩余串的个数的最大值
        }
        ret+=Max;//相加
    }
    return ret;
}
bool DFS(int d){
    int tmp=h();
    if(d+tmp>maxd) return 0;
    if(!tmp) return 1;
    vector<int> pan[4];//存当前pos含有'A''C''G''T'的编号
    for(int i=0;i<n;i++){
        int t=Map[st[i].str[st[i].pos]-'A'];//统计所有当前pos的字符种类
        if(0<=t&&t<4)pan[t].push_back(i);//判断是否为末位(即当前字符串匹配完毕)
    }
    for(int i=0;i<4;i++){//当前d选择哪一个字符
        int siz=int(pan[i].size());
        if(!siz)continue;//未在所有当前pos出现则跳过
        for(int j=0;j<siz;j++)//选择
            st[pan[i][j]].pos++;
        if(DFS(d+1)) return 1;//进入下一层
        for(int j=0;j<siz;j++)//恢复现场(就是不选当前字符)
            st[pan[i][j]].pos--;
    }
    return 0;
}
int main()  
{
    Map['A'-'A']=0,Map['C'-'A']=1,Map['G'-'A']=2,Map['T'-'A']=3;
    int T=read();
    while(T--){
        maxd=0;
        n=read();
        int St=0;
        memset(st,0,sizeof st);
        for(int i=0;i<n;i++){
            scanf("%s",st[i].str);
            st[i].len=strlen(st[i].str);
            St=max(St,st[i].len);
        }
        for(maxd=St;;maxd++)//模拟深度
            if(DFS(0)) break;
        printf("%d\n",maxd);
    }
    return 0;  
}