[AC自动机]DNA Repair

DNA repair
Time Limit: 2000MS Memory Limit: 65536K
Total Submissions: 4955 Accepted: 2301

Description

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'.

You are to help the biologists to repair a DNA by changing least number of characters.

Input

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the
number of characters which need to be changed. If it's impossible to repair the given DNA, print -1.

Sample Input

2
AAA
AAG
AAAG    
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

Sample Output

Case 1: 1
Case 2: 4
Case 3: -1

Source


这道题加深了我对AC_automation的理解,尤其是fail指针的指向关系。充分挖掘出了fail指针的优势,节约了空间。


我们容易想到,建立节点数为4^maxl的一棵树,因为这样可以表示出所有的状态(某一步是否改变),在这棵树上进行动态规划,这样能够实现,但是空间不能承受,原因很简单,它退化成了搜索。

两点关键思路:

1、父亲节点如果匹配成功,则儿子节点必定匹配成功(因为父亲表示的是前缀),则父亲和儿子都是无效状态(自顶向下遍历时传递关系)。这个结论能减少很大一部分无效枚举。

2、fail指针的作用在于:当匹配失败时,能够继续匹配后缀。这就表明,经fail指针跳转的是有效状态,且正好我们需要继续从这里继续匹配。因此利用fail指针,我们可以将不存在的next边都表示出来,而边权为1,这样做也就表示出了所有状态。但是由于节点重用,空间非常小,上界为O(N*maxl)。


#include <cstdio>
#include <cstring>
#define TOIND(a) ((a)=='A'?1:(a)=='T'?2:(a)=='C'?3:4)

bool danger[1010];
int f[1010][1010];
int next[1010][10];
int fail[1010];
char str[1010];
char pattern[1010];
int poolsize = 1;
int que[200000];
const int root = 1;

void init()
{
	memset(danger,0,sizeof danger);
	memset(f,0x3f,sizeof f);
	memset(fail,0,sizeof fail);
	memset(next,0,sizeof next);
	poolsize = 1;
}

void reads(char* ss,int &ll)
{
	scanf("%s",ss+1);
	ll = 0;
	while (ss[++ll])
		ss[ll] = TOIND(ss[ll]);
	ll --;
}

void insert()
{
	int len = 0;
	reads(pattern,len);

	int u = root;
	for (int i=1;i<=len;i++)
	{
		if (next[u][pattern[i]])
			u = next[u][pattern[i]];
		else
		{
			poolsize ++;
			u = next[u][pattern[i]] = poolsize;
		}
		if (danger[u])
			break;
	}
	danger[u] = true;
}

void build_ac_automation()
{
	int l = 0;
	int r = 0;

	for (int i=1;i<=4;i++)
	{
		if (next[root][i])
		{
			fail[next[root][i]] = root;
			r ++;
			que[r] = next[root][i];
		}
		else
			next[root][i] = root;
	}

	while (l < r)
	{
		l ++;
		int u = que[l];
		danger[u] |= danger[fail[u]];//
		if (danger[u]) continue;//
		for (int i=1;i<=4;i++)
		{
			if (!next[u][i])
				next[u][i] = next[fail[u]][i];//
			else
			{
				fail[next[u][i]] = next[fail[u]][i];
				r ++;
				que[r] = next[u][i];
			}
		}
	}
}

void updatamin(int& a,int b)
{
	if (b < a)
		a = b;
}

int main()
{
	freopen("dna.in","r",stdin);
	freopen("dna.out","w",stdout);
	int n=0,T=0;
	while (1)
	{
		init();
		T ++;
		scanf("%d",&n);
		if (n == 0) break;
		
		//vvvvvvvvvvvvvvvvvvvvv
		for (int i=1;i<n+1;i++)
			insert();
		build_ac_automation();
		//^^^^^^^^^^^^^^^^^^^^^

		int lens = 0;
		reads(str,lens);
		f[0][root] = 0;
		for (int p=0;p<lens;p++)
		{
			for (int q=1;q<=poolsize;q++)
			{
				if (danger[q]) continue;
				for (int i=1;i<=4;i++)
				{
					if (str[p+1] == i)
						updatamin(f[p+1][next[q][i]],f[p][q]);
					else
						updatamin(f[p+1][next[q][i]],f[p][q]+1);
				}
			}
		}
		int ans = 0x3f3f3f3f;
		for (int q=0;q<=poolsize;q++)
			if (!danger[q])
				updatamin(ans,f[lens][q]);
		if (ans == 0x3f3f3f3f)
			printf("Case %d: -1\n",T);
		else
			printf("Case %d: %d\n",T,ans);
	}
	return 0;
}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
Python AC自动机是一个用于字符串匹配的算法,它可以高效地在一段文本中查找多个预定义的模式。它的实现可以使用多种库,其中包括ac自动机python和ahocorasick-pythonac自动机python是一个对标准的ac自动机算法进行了完善和优化的实现,适用于主流的Python发行版,包括Python2和Python3。它提供了更准确的结果,并且可以通过pip进行安装,具体的安装方法可以参考官方文档或者使用pip install命令进行安装。 ahocorasick-python是另一个实现AC自动机的库,它也可以用于Python2和Python3。你可以通过官方网站或者GitHub源码获取更多关于该库的信息和安装指南。 对于AC自动机的使用,一个常见的例子是在一段包含m个字符的文章中查找n个单词出现的次数。要了解AC自动机,需要有关于模式树(字典树)Trie和KMP模式匹配算法的基础知识。AC自动机的算法包括三个步骤:构造一棵Trie树,构造失败指针和模式匹配过程。在构造好AC自动机后,可以使用它来快速地在文本中查找预定义的模式,并统计它们的出现次数。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [ahocorasick-python:AC自动机python的实现,并进行了优化。 主要修复了 查询不准确的问题](https://download.csdn.net/download/weixin_42122986/18825869)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *2* *3* [Python实现多模匹配——AC自动机](https://blog.csdn.net/zichen_ziqi/article/details/104246446)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值