DNA repair问题

问题:Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'.

You are to help the biologists to repair a DNA by changing least number of characters.
 
Input
The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.
 
Output
For each test case, print a line containing the test case number( beginning with 1) followed by the
number of characters which need to be changed. If it's impossible to repair the given DNA, print -1.
 

Sample Input
2
AAA
AAG
AAAG    
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

 
Sample Output
Case 1: 1
Case 2: 4
Case 3: -1

回答:题意给出一些不合法的模式DNA串,给出一个原串,问最少需要修改多少个字符,使得原串中不包含非法串
多串匹配,先想到AC自动机,需要求出最少需要修改多少字符,DP。
结合在一起
每一次沿着Trie树往下走,不能到达叶子结点罢了。不过对于为空但是合法的孩子需要进行处理。
DP方面,dp[i][j]表示前i个字符,当前为状态j的时候,需要修改的最少字符数。
从i-1的状态,找到之后的状态,如果字符与原串相同,则不变,否则+1。代码如下:

#include<iostream>
#include<cstdio>
#include<cstring>
#include<cmath>
#include<algorithm>
#define N 100005
#define MOD 100000
#define inf 1<<29
#define LL long long
using namespace std;
struct Trie{
    Trie *next[4];
    Trie *fail;
    int kind,isword;
};
Trie *que[N],s[N];
int idx;
int id(char ch){
    if(ch=='A') return 0;
    else if(ch=='T') return 1;
    else if(ch=='C') return 2;
    return 3;
}
Trie *NewNode(){
    Trie *tmp=&s[idx];
    for(int i=0;i<4;i++) tmp->next[i]=NULL;
    tmp->isword=0;
    tmp->kind=idx++;
    tmp->fail=NULL;
    return tmp;
}
void Insert(Trie *root,char *s,int len){
    Trie *p=root;
    for(int i=0;i<len;i++){
        if(p->next[id(s[i])]==NULL)
            p->next[id(s[i])]=NewNode();
        p=p->next[id(s[i])];
    }
    p->isword=1;
}
void Bulid_Fail(Trie *root){
    int head=0,tail=0;
    que[tail++]=root;
    root->fail=NULL;
    while(head<tail){
        Trie *tmp=que[head++];
        for(int i=0;i<4;i++){
            if(tmp->next[i]){
                if(tmp==root) tmp->next[i]->fail=root;
                else{
                    Trie *p=tmp->fail;
                    while(p!=NULL){
                        if(p->next[i]){
                           tmp->next[i]->fail=p->next[i];
                           break;
                        }
                        p=p->fail;
                    }
                    if(p==NULL) tmp->next[i]->fail=root;
                }
                if(tmp->next[i]->fail->isword) tmp->next[i]->isword=1;
                que[tail++]=tmp->next[i];
            }
            else if(tmp==root) tmp->next[i]=root;
            else tmp->next[i]=tmp->fail->next[i];
        }
    }
}
int dp[1005][2005];
int slove(char *str,int len){
    for(int i=0;i<=len;i++) for(int j=0;j<idx;j++) dp[i][j]=inf;
    dp[0][0]=0;
    for(int i=1;i<=len;i++){
        for(int j=0;j<idx;j++){
            if(s[j].isword) continue;
            if(dp[i-1][j]==inf) continue;
            for(int k=0;k<4;k++){
                int r=s[j].next[k]->kind;
                if(s[r].isword) continue;
                dp[i][r]=min(dp[i][r],dp[i-1][j]+(id(str[i-1])!=k));
            }
        }
    }
    int ans=inf;
    for(int i=0;i<idx;i++) ans=min(ans,dp[len][i]);
    return ans==inf?-1:ans;
}
char str[1005];
int main(){
    int n,cas=0;
    while(scanf("%d",&n)!=EOF&&n){
        idx=0;
        Trie *root=NewNode();
        for(int i=0;i<n;i++){
            scanf("%s",str);
            Insert(root,str,strlen(str));
        }
        Bulid_Fail(root);
        scanf("%s",str);
        printf("Case %d: %d\n",++cas,slove(str,strlen(str)));
    }
    return 0;
}

转载于:https://www.cnblogs.com/benchao/p/4537927.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
KMP算法是一种高效的字符串匹配算法,它可以在匹配过程中跳过一些不必要的比较,从而提高匹配的效率。下面是KMP算法实现DNA匹配问题的思路介绍: 1. 构建next数组:首先,我们需要构建一个next数组,用于记录模式串中每个位置的最长公共前缀和最长公共后缀的长度。具体构建方法如下: - 初始化next数组,长度与模式串相同,全部为0。 - 从第二个位置开始,依次计算每个位置的最长公共前缀和最长公共后缀的长度。 - 如果当前位置的字符与前一个位置的字符相等,则最长公共前缀的长度加1,并将该值赋给next数组对应位置。 - 如果当前位置的字符与前一个位置的字符不相等,则需要回溯到前一个位置的最长公共前缀的末尾字符,继续比较,直到找到一个字符与当前位置的字符相等或者回溯到模式串的起始位置。 2. 匹配过程:在匹配过程中,我们使用两个指针i和j分别指向待匹配串和模式串的当前位置。具体匹配方法如下: - 如果当前位置的字符匹配成功,则将两个指针都向后移动一位。 - 如果当前位置的字符匹配失败,则根据next数组的值调整模式串的位置,将模式串的指针j移动到next[j]的位置。 下面是KMP算法实现DNA匹配问题的示例代码: ```python def kmp_search(text, pattern): n = len(text) m = len(pattern) next = get_next(pattern) i = 0 j = 0 while i < n and j < m: if text[i] == pattern[j]: i += 1 j += 1 else: if j != 0: j = next[j-1] else: i += 1 if j == m: return i - j else: return -1 def get_next(pattern): m = len(pattern) next = [0] * m i = 1 j = 0 while i < m: if pattern[i] == pattern[j]: j += 1 next[i] = j i += 1 else: if j != 0: j = next[j-1] else: next[i] = 0 i += 1 return next text = "ATCGATCGA" pattern = "CGAT" index = kmp_search(text, pattern) print("Pattern found at index:", index) # 输出:Pattern found at index: 3 ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值