upc-Assessing Genomes(kmp求最小循环节)

最新推荐文章于 2023-09-26 22:37:43 发布

Cosmic_Tree

最新推荐文章于 2023-09-26 22:37:43 发布

阅读量254

点赞数

分类专栏：真题练习字符串模板

本文链接：https://blog.csdn.net/Cosmic_Tree/article/details/108921330

版权

真题练习同时被 3 个专栏收录

89 篇文章 1 订阅

订阅专栏

字符串

47 篇文章 0 订阅

订阅专栏

模板

29 篇文章 1 订阅

订阅专栏

该博客探讨了一种在末日危机背景下，科学家使用超级智能算法对抗毁灭性病毒的方法。具体而言，涉及利用KMP算法计算DNA序列的重复得分，并通过排序和匹配策略最小化病毒造成的损害。博客详细解释了KMP算法的实现，包括计算最小循环节的长度，以及如何根据这些信息进行序列匹配以降低病毒的破坏效果。最终，博主提供了C++代码示例来展示解决问题的步骤。

摘要由CSDN通过智能技术生成

题目描述

The world is at the brink of extinction. A mutated virus threatens to destroy all living organisms.
As a last hope, a team of super-smart scientists, including – of course – you, is currently working on an antivirus. Unfortunately, your team is unable to analyse the DNA in time. They sequenced n parts of the virus’ DNA and need to match them with n available strands for antiviruses. As the algorithms expert, you need to implement a specialised procedure to solve this problem.
Your approach needs to be fast – there is not much time left!
You first need to determine the repetition score of each DNA sequence. The repetition score of a sequence s is equal to the length of the shortest sequence u such that s is equal to the k-fold repetition of u, for some positive integer k. For instance, ATGATG has a repetition score of 3, since it can be produced by repeating ATG two times. On the other hand, ATATA has a repetition score of 5, as it cannot be produced from any proper substring.
Once you obtained the scores of all sequences, you need to match the n antivirus sequences with the n virus sequences in a way that minimises the damage caused by the virus. When two sequences are matched, the damage caused by the virus is equal to the squared difference between the two repetition scores. For instance, matching the antivirus sequence ATGATG with the virus sequence ATATA causes (3 − 5)2 = 4 units of damage.
If you match the DNA sequences optimally, what is the minimal total damage caused by the virus, taken as a sum over all matched pairs?

输入

The input consists of:
• A line with an integer n (1 ≤ n ≤ 50), the number of DNA sequences of the virus and antivirus each.
• n lines, each with a virus DNA sequence.
• n lines, each with an antivirus DNA sequence.
Each DNA sequence is a non-empty string with a length of at most 250 and consists of lowercase letters a-z and uppercase letters A-Z.

输出

Output one integer, the minimal total damage.

样例输入

2
TTTTTT
TATG
TATATA
AAAGAAAG

样例输出

题意

就是给你 2n 个串，前 n 个是一组，后 n 个是一组，每个串对应一个值（这个值就是最小循环节的长度）
第一个组里的一个串可以与第二个组里的一个串求一个“伤害”（即两个串的差值的平方），让你求最小的“伤害”和（每个串只能参与一次“伤害”的计算）

思路

kmp 求最小循环节的长度，sort 排序，对应求 " 伤害 " 和

代码

#include<iostream>
#include<string>
#include<map>
#include<set>
//#include<unordered_map>
#include<queue>
#include<cstdio>
#include<vector>
#include<cstring>
#include<algorithm>
#include<iomanip> 
#include<cmath>
#include<fstream>
#define X first
#define Y second
#define INF 0x3f3f3f3f
using namespace std;
typedef long long ll;
typedef unsigned long long llu;
int n,a[260],b[260],Next[260],suffix[260];//Next[i]表示前面长度为i的子串中，前缀和后缀相等的最大长度。
char s[260];
//求str对应的next数组
void getNext(char const* str, int len)
{
    int i = 0;
    Next[i] = -1;
    int j = -1;
    while( i < len )
    {
        if( j == -1 || str[i] == str[j] )   //循环的if部分
        {
            ++i;
            ++j;
            //修正的地方就发生下面这4行
            if( str[i] != str[j] ) //++i，++j之后，再次判断ptrn[i]与ptrn[j]的关系
                Next[i] = j;      //之前的错误解法就在于整个判断只有这一句。
            else
                Next[i] = Next[j];  //这里其实是优化了后的，也可以仍是next[i]=j
            //当str[i]==str[j]时，如果str[i]匹配失败，那么换成str[j]肯定也匹配失败，
            //所以不是令next[i]=j，而是next[i] = next[j]，跳过了第j个字符，
            //即省去了不必要的比较
            //非优化前的next[i]表示前i个字符中前缀与后缀相同的最大长度
        }
        else                                 //循环的else部分
            j = Next[j];
    }
}
 
//在目标字符串target中，字符str出现的个数
//n为target字符串的长度，m为str字符串的长度
int kmp_match(char *target,int n,char *str,int m)
{
    int i=0,j=0;  //i为target中字符的下标，j为str中字符的下标
    int cnt=0;   //统计str字符串在target字符串中出现的次数
    while(i<=n-1)
	{
        if(j<0||target[i]==str[j])
		{
            i++;
            j++;
        }
        else{
            j=Next[j]; //当j=0的时候，suffix[0]=-1，这样j就会小于0，所以一开始有判断j是否小于0
        }
 
        //str在target中找到匹配
        if(j==m)
		{
            cnt++;
            j=Next[j];
        }
    }
    return cnt;
}
//在目标字符串target中，若存在str字符串，返回匹配成功的第一个字符的位置
int kmp_search(char *target,int n,char *str,int m)
{
    int i=0,j=0;  //i为target中字符的下标，j为str中字符的下标
    int cnt=0;   //统计str字符串在target字符串中出现的次数
    while(i<n && j<m)
	{
        if(j<0||target[i]==str[j])
		{
            i++;
            j++;
        }
        else
		{
            j=suffix[j]; //当j=0的时候，suffix[0]=-1，这样j就会小于0，所以一开始有判断j是否小于0
        }
    }
    if(j>=m)
        return i-m;
    else
        return -1;
}
int main()
{
     
    scanf("%d",&n);
    for(int i=1;i<=n;i++)
    {
        scanf("%s",s);
        int len=strlen(s);
        getNext(s,len);
        a[i]=len-Next[len];
        if(len%a[i]!=0) a[i]=len; 
        //假设S的长度为len，则S存在最小循环节，循环节的长度L为len-next[len]
		//如果len可以被len - next[len]整除，则表明字符串S可以完全由循环节循环组成，循环周期T=len/L。
		//如果不能，说明还需要再添加几个字母才能补全。需要补的个数是循环个数L-len%L=L-(len-L)%L=L-next[len]%L，L=len-next[len]。
    }
    for(int i=1;i<=n;i++)
    {
        scanf("%s",s);
        int len=strlen(s);
        getNext(s,len);
        b[i]=len-Next[len];
        if(len%b[i]!=0) b[i]=len; 
    }
    //a[i] b[i] 求的是对应串的最小循环节 
    sort(a+1,a+n+1);
    sort(b+1,b+n+1);
    ll ans=0;
    for(int i=1;i<=n;i++) ans=ans+(a[i]-b[i])*(a[i]-b[i]);
    printf("%lld",ans);
    return 0;
}

这里我套用的网络上的kmp模板，附上大佬网址点此进入

Cosmic_Tree

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
upc-Assessing Genomes(kmp求最小循环节)

题目描述The world is at the brink of extinction. A mutated virus threatens to destroy all living organisms.As a last hope, a team of super-smart scientists, including – of course – you, is currently working on an antivirus. Unfortunately, your team is unable
复制链接

扫一扫

专栏目录