hdu2457 poj3691 DNA Repair AC自动机 + dp

最新推荐文章于 2020-03-19 20:51:12 发布

黑码

最新推荐文章于 2020-03-19 20:51:12 发布

阅读量393

点赞数

分类专栏： AC自动机

本文链接：https://blog.csdn.net/Littlewhite520/article/details/72469473

版权

AC自动机专栏收录该内容

7 篇文章 0 订阅

订阅专栏

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters ‘A’, ‘G’ , ‘C’ and ‘T’. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA “AAGCAG” to “AGGCAC” to eliminate the initial causing disease segments “AAG”, “AGC” and “CAG” by changing two characters. Note that the repaired DNA can still contain only characters ‘A’, ‘G’, ‘C’ and ‘T’.

You are to help the biologists to repair a DNA by changing least number of characters.

Input
The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in “AGCT”, which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in “AGCT”, which is the DNA to be repaired.
The last test case is followed by a line containing one zeros.

Output
For each test case, print a line containing the test case number( beginning with 1) followed by the
number of characters which need to be changed. If it’s impossible to repair the given DNA, print -1.
Sample Input
2
AAA
AAG
AAAG
2
A
TG
TGAATG
4
A
G
C
T
AGT
0
Sample Output
Case 1: 1
Case 2: 4
Case 3: -1

解题思路：采用AC自动机＋dp。想法比较创新，用给定的n个
危险DNA序列，建立一个Trie 树，每个树的节点都可以看做状态
转移方程的一个状态。即只要当前节点不为危险节点（某个
危险DNA序列的结束位置），则此状态可取。

状态转移方程为 dp[i][j->next[k]] = min(dp[i][j->next[k]],dp[i-1][j] + (S[i] != k))
(dp[i][j]表示在我们构造解的过程中，长度为i且到节点 j位置的最少操作数，
不可达到值为inf） .

或者我们这样看，根据危险DNA序列所建的字典树，我们用模拟的方式，从第一个
字符开始构造，依次递增，找到一个满足要求的字符串，在构造此字符串的同时，比较
该字符串和输入要判断的序列S，若该位置i的字符和s[i]，相同，则表示，此位置
的字符不需要改变，反则，需要把s[i]该为词字符，为一次改变操作.
所以此题中，由字典树构造失败指针时，需要考虑考虑所有存在的next 节点，
即 temp->next[i] =NULL时，需对temp->next[i]的指向赋值，使其充当自身next节点
的失败指针的作用，确保匹配失败是可以回溯到相应节点的位置。

注意动态规划方程中每模拟增加一个字符，就是从j 状态节点，转到 j->next[k]状态节点，
所以 dp[i][j->next[k]] 可由 dp[i-1][j] + (S[i] != k) 得到。
所以最后的结果应该是搜 dp[len][j]，0<=j<=L 的最小值

#include<stdio.h>
#define M 1010
#define K 0x7ffffff
struct trie{
    int fail;//失配指针
    int next[4];//指向AGCT的指针
    int sign;//标记此串的后缀是否为病串
}t[M];//静态trie数组用以代码树形链表结构
int q[M],head,tail,L;//数组模拟的队列及头尾指针和静态数据长度
int dp[M][M],turn[90];//dp数组和字符转换数组
char str[22],s[M];//病串和DNA序列
void Insert(char *a)//将病串插入字典树，并在串尾作标记
{
    int p=0,i=0,x,j;
    while(a[i]){
        x=t[p].next[turn[a[i]]];
        if(x<0){//下一层如果未访问过，静态申请空间并初始化
            t[p].next[turn[a[i]]]=x=++L;//++L实现静态申请
            t[x].fail=-1;t[x].sign=0;
            for(j=0;j<4;j++)t[x].next[j]=-1;
        }
        p=x;i++;//转向下一层，直到串尾
    }
    t[p].sign=1;//串尾标记
}
void build_ACauto()
{
    int i,p,x;
    q[tail++]=0;
    while(head<tail){//用广搜逐层更新失配指针
        p=q[head++];
        for(i=0;i<4;i++){
            x=t[p].next[i];
            if(x<0){//如果此处不存在该核苷酸
                if(p)t[p].next[i]=t[t[p].fail].next[i];//如果p非根，此处存父结点失配指针指向的下一同核苷酸
                else t[p].next[i]=0;//如果父结点为根结点，此处存根结点
            }
            else{//如果此处存在该核苷酸
                if(p){//如果p非根结点，使x的失配指针指向父结点的失配指针所指向结点的下一层同核苷酸
                    t[x].fail=t[t[p].fail].next[i];
                    if(t[t[x].fail].sign)t[x].sign=1;//如果此串后缀为病串，则标记
                }
                else t[x].fail=0;//如果p为根结点，则x的失配指针直接指向根结点
                q[tail++]=x;//压入队列
            }
        }
    }
}
void solve()
{
    int i,j,k,n,p,x;
    for(i=0;!i||s[i-1];i++)
        for(j=0;j<=L;j++)dp[i][j]=K;//初始化为极大值
    n=i-1;dp[0][0]=0;//n为s的长度，将dp[0][0]初始化为零，设定此处不需改变，方便下面更新
    for(i=1;i<=n;i++)
        for(j=0;j<=L;j++)
            if(dp[i-1][j]<K)//如果上一层对应处更新过，此处有必要更新
                for(k=0;k<4;k++)//向四种核苷酸方向更新
                    if(!t[t[j].next[k]].sign){//如果此处非病串的结尾，则可更新，否则不可更新
                        p=t[j].next[k];//获取这个方向在trie数组的位置
                        x=dp[i-1][j]+(turn[s[i-1]]!=k);//如果DNA序列此处的核苷酸与其不同，则需要改变，否则不改变
                        if(x<dp[i][p])dp[i][p]=x;//如果此方向需改变的核苷酸少，则更新dp
                    }
    x=K;
    for(i=0;i<=L;i++)//取dp最后一行最小值
        if(!t[i].sign&&dp[n][i]<x)x=dp[n][i];
    if(x==K)x=-1;//如果最后一没有更新，则输出-1
    printf("%d\n",x);
}
int main()
{
    int n,i,C=0;
    turn['A']=0;turn['G']=1;//AGCT向0123的转换
    turn['C']=2;turn['T']=3;
    while(scanf("%d",&n),n){
        head=tail=L=0;//初始化队列头尾指针使队列为空，初始化静态trie数组使其处于未用状态
        t[0].fail=-1;t[0].sign=0;//初始化trie树根结点
        for(i=0;i<4;i++)t[0].next[i]=-1;
        while(n--){
            scanf("%s",str);//读入病串
            Insert(str);//将病串插入字典树(trie树)
        }
        build_ACauto();//更新字典树各结点的失配指针，构建成AC自动机
        scanf("%s",s);
        printf("Case %d: ",++C);
        solve();//通过动态规划(dp)找出至少要改变的核苷酸数
    }
    return 0;
}

黑码

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hdu2457 poj3691 DNA Repair AC自动机 + dp

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters ‘A’,
复制链接

扫一扫

专栏目录