hdu 2757 DNA repair AC自动机dp--------完全不懂--------

最新推荐文章于 2020-09-13 20:14:00 发布

cyendra

最新推荐文章于 2020-09-13 20:14:00 发布

阅读量611

点赞数

分类专栏：字符串解题报告文章标签：动态规划

本文链接：https://blog.csdn.net/cyendra/article/details/8831545

版权

解题报告同时被 2 个专栏收录

260 篇文章 0 订阅

订阅专栏

字符串

25 篇文章 0 订阅

订阅专栏

DNA repair

Time Limit: 5000/2000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 872 Accepted Submission(s): 473

Problem Description

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'.

You are to help the biologists to repair a DNA by changing least number of characters.

Input

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the
number of characters which need to be changed. If it's impossible to repair the given DNA, print -1.

Sample Input

  
  
   
   2
AAA
AAG
AAAG    
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

Sample Output

  
  
   
   Case 1: 1
Case 2: 4
Case 3: -1

--------------------------------

题目的大意：给定n个危险DNA序列，再给一段长度长为L的DNA序列S，
DNA序列S中可能包含危险DNA序列，可以改变S中的字符，改变一个
算一次操作，问最少操作几次可使S不含危险DNA序列并输出，
如果怎么操作都会含有危险DNA序列输出-1。

解题思路：采用AC自动机＋dp。想法比较创新，用给定的n个
危险DNA序列，建立一个Trie 树，每个树的节点都可以看做状态
转移方程的一个状态。即只要当前节点不为危险节点（某个
危险DNA序列的结束位置），则此状态可取。

状态转移方程为 dp[i][j->next[k]] = min(dp[i][j->next[k]],dp[i-1][j] + (S[i] != k))
(dp[i][j]表示在我们构造解的过程中，长度为i且到节点 j位置的最少操作数，
不可达到值为inf） .

或者我们这样看，根据危险DNA序列所建的字典树，我们用模拟的方式，从第一个
字符开始构造，依次递增，找到一个满足要求的字符串，在构造此字符串的同时，比较
该字符串和输入要判断的序列S，若该位置i的字符和s[i]，相同，则表示，此位置
的字符不需要改变，反则，需要把s[i]该为词字符，为一次改变操作.
所以此题中，由字典树构造失败指针时，需要考虑考虑所有存在的next 节点，
即 temp->next[i] =NULL时，需对temp->next[i]的指向赋值，使其充当自身next节点
的失败指针的作用，确保匹配失败是可以回溯到相应节点的位置。

注意动态规划方程中每模拟增加一个字符，就是从j 状态节点，转到 j->next[k]状态节点，
所以 dp[i][j->next[k]] 可由 dp[i-1][j] + (S[i] != k) 得到。
所以最后的结果应该是搜 dp[len][j]，0<=j<count 的最小值

--------------------------------

#include <iostream>
#include <cstring>
#include <cstdio>
#include <queue>

using namespace std;

const int kind = 4;
const int OO=1e9;

int cnt_data;

struct node
{
    node *fail;
    node *next[kind];
    int num;
    bool visit;
    bool flag;
    node()
    {
        fail = NULL;
        visit = false;
        flag = false;
        num = cnt_data++;
        memset(next,NULL,sizeof(next));
    }
};
node* data[111111];

node* query_temp_que[1111];

int f[1111][11111];

int get_dna(char c)
{
    if (c=='A') return 0;
    if (c=='G') return 1;
    if (c=='C') return 2;
    if (c=='T') return 3;
    return -1;
}

void insert(node *root,char *str)
{
    node *p=root;
    int i,index;
    int len=strlen(str);
    for (i=0; i<len; i++)
    {
        index=get_dna(str[i]);
        if(p->next[index]==NULL)
        {
            p->next[index]=new node();
            data[cnt_data-1]=p->next[index];
        }
        p=p->next[index];
        if (p->flag) break;
    }
    p->flag=true;
}

//寻找失败指针
void build_ac_automation(node *root)
{
    int i;
    queue<node *>Q;
    root->fail = NULL;
    Q.push(root);
    while(!Q.empty())
    {
        node *temp=Q.front();//q[head++];//取队首元素
        Q.pop();
        node *p=NULL;
        for(i=0; i<kind; i++)
        {
            if(temp->next[i]!=NULL)//寻找当前子树的失败指针
            {
                //-------
                if(temp==root) temp->next[i]->fail=root;
                else
                {
                    temp->next[i]->fail=temp->fail->next[i];
                    if (temp->fail->next[i]->flag == true)
                        //说明从root 到 节点temp->next[i]的字符串中 包含子串 从root
                        // 到节点 temp->fail->next[i]的危险DNA序列，故此节点 标记为 1，
                        // 动态规划时不能到此状态。
                        temp->next[i]->flag = true;
                }
                //-------
                Q.push(temp->next[i]);
            }
            else
            {
                if(temp==root) temp->next[i]=root;
                else temp->next[i]=temp->fail->next[i];
            }
        }
    }
}

int query(node *root,char *str)
{
    int index;
    int head,tail;
    int len=strlen(str);
    head=tail=0;
    node *p = root;

    for (int i=0;i<=len;i++)
    {
        for (int j=0;j<=cnt_data;j++)
        {
            f[i][j]=OO;
        }
    }
    f[0][0]=0;
    for (int i=1;i<=len;i++)
    {
        index = get_dna(str[i-1]);
        for (int j=0;j<cnt_data;j++)
        {
            if (f[i-1][j]<OO)
            {
                for (int k=0;k<4;k++)
                {
                    if (!data[j]->next[k]->flag)
                    {
                        p=data[j]->next[k];
                        f[i][p->num]=min(f[i][p->num],f[i-1][j]+(index!=k));
                    }
                }
            }
        }
    }
    int ans=OO;
    for (int j=0;j<cnt_data;j++)
    {
        if (f[len][j]<ans) ans=f[len][j];
    }
    if (ans==OO) return -1;
    return ans;
}


int main()
{
    int n;
    char dna[1111];
    char key[111];
    node* root;
    int cnt=1;
    while (~scanf("%d",&n))
    {
        if (n==0) break;
        cnt_data=0;
        root=new node();
        data[cnt_data-1]=root;
        for (int i=0; i<n; i++)
        {
            scanf("%s",key);
            insert(root,key);
        }
        build_ac_automation(root);
        scanf("%s",dna);
        int ans=query(root,dna);
        printf("Case %d: %d\n",cnt++,ans);
    }
    return 0;
}

cyendra

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
hdu 2757 DNA repair AC自动机dp--------完全不懂--------

DNA repairTime Limit: 5000/2000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Submission(s): 872 Accepted Submission(s): 473Problem DescriptionBiologists finall
复制链接

扫一扫

专栏目录