暑假训练-训练8.5 KMP

最新推荐文章于 2020-08-11 10:37:19 发布

Katapeltes

最新推荐文章于 2020-08-11 10:37:19 发布

阅读量666

点赞数

分类专栏： acm

本文链接：https://blog.csdn.net/katapeltes/article/details/52134216

版权

acm 专栏收录该内容

200 篇文章 0 订阅

订阅专栏

A - 字符串初步
Time Limit:1000MS Memory Limit:65536KB 64bit IO Format:%lld & %llu
Submit

Status
Description
The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated.

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.
Input
Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:
A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
m lines each containing a single base sequence consisting of 60 bases.
Output
For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string “no significant commonalities” instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.
Sample Input
3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Sample Output
no significant commonalities
AGATAC
CATCATCAT

暴力求解即可


#include<stdio.h>
#include<string>
#include<cstring>
#include<queue>
#include<algorithm>
#include<functional>
#include<vector>
#include<iomanip>
#include<math.h>
#include<iostream>
#include<sstream>
#include<stack>
#include<set>
using namespace std;
const int MAX=65;
string Str[15],Ans;
int T,N,Len;
int main()
{
    cin.sync_with_stdio(false);
    cin>>T;
    while (T--)
    {
        cin>>N;
        for (int i=0; i<N; i++) cin>>Str[i];
        Len=-1;
        for (int l=3; l<=60; l++)
            for (int i=0; i<=60-l; i++)
            {
                bool flag=true;
                string temp=Str[0].substr(i,l);
                for (int x=1; x<N; x++)
                {
                    if (Str[x].find(temp)!=string::npos)
                        continue;
                    else
                    {
                        flag=false;
                        break;
                    }
                }
                if (flag)
                {
                    if (l>Len)
                    {
                        Ans=temp;
                        Len=l;
                    }
                    else if (l==Len) Ans=min(Ans,temp);
                }
            }
        if (Len==-1) cout<<"no significant commonalities"<<endl;
        else cout<<Ans<<endl;
    }
    return 0;
}

B - KMP入门必做-匹配
Time Limit:5000MS Memory Limit:32768KB 64bit IO Format:%I64d & %I64u
Submit

Status
Description
Given two sequences of numbers : a[1], a[2], …… , a[N], and b[1], b[2], …… , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], …… , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.
Input
The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], …… , a[N]. The third line contains M integers which indicate b[1], b[2], …… , b[M]. All integers are in the range of [-1000000, 1000000].
Output
For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.
Sample Input
2
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 1 3
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 2 1
Sample Output
6
-1


#include<stdio.h>
#include<string>
#include<cstring>
#include<queue>
#include<algorithm>
#include<functional>
#include<vector>
#include<iomanip>
#include<math.h>
#include<iostream>
#include<sstream>
#include<stack>
#include<set>
using namespace std;
const int MAX=1000006;
int A[MAX],B[10005];
int T,N,M,Next[10005];
int main()
{
    cin.sync_with_stdio(false);
    cin>>T;
    while (T--)
    {
        cin>>N>>M;
        for (int i=1; i<=N; i++) cin>>A[i];
        for (int i=1; i<=M; i++) cin>>B[i];
        memset(Next,0,sizeof(Next));
        Next[0]=-1;
        for (int i=1; i<=M; i++)
        {
            int p=Next[i-1];
            while (p>=0&&B[p+1]!=B[i]) p=Next[p];
            Next[i]=p+1;
        }
        int k=-1;
        for (int i=1,p=0; i<=N; i++)
        {
            while (p>=0&&B[p+1]!=A[i]) p=Next[p];
            if (++p==M)
            {
                k=i-p+1;
                p=Next[p];
                break;
            }
        }
        cout<<k<<endl;
    }
    return 0;
}

C - KMP入门必做-周期
Time Limit:3000MS Memory Limit:65536KB 64bit IO Format:%lld & %llu
Submit

Status
Description
Given two strings a and b we define a*b to be their concatenation. For example, if a = “abc” and b = “def” then a*b = “abcdef”. If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = “” (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output
For each s you should print the largest n such that s = a^n for some string a.
Sample Input
abcd
aaaa
ababab
.
Sample Output
1
4
3
Hint
This problem has huge input, use scanf instead of cin to avoid time limit exceed.


C:  next表示模式串如果第i位(设str[0]为第0位)与文本串第j位不匹配则要回到第next[i]位继续与文本串第j位匹配。则模式串第1位到next[n]与模式串第n-next[n]位到n位是匹配的。
如果n%(n-next[n])==0,则存在重复连续子串，长度为n-next[n]。

#include<stdio.h>
#include<string>
#include<cstring>
#include<queue>
#include<algorithm>
#include<functional>
#include<vector>
#include<iomanip>
#include<math.h>
#include<iostream>
#include<sstream>
#include<stack>
#include<set>
using namespace std;
const int MAX=1000005;
char S[MAX],P[MAX];
int Next[MAX];
int main()
{
    cin.sync_with_stdio(false);
    while (cin>>(S+1)&&S[1]!='.')
    {
        memset(Next,0,sizeof(Next));Next[0]=-1;
        int len=strlen(S+1);
        for (int i=1;i<=len;i++)
        {
            int p=Next[i-1];
            while (p>=0&&S[p+1]!=S[i]) p=Next[p];
            Next[i]=p+1;
        }
        int ans=(len%(len-Next[len])==0)?(len/(len-Next[len])):1;
        cout<<ans<<endl;
    }
    return 0;
}

D - 匹配
Time Limit:1000MS Memory Limit:32768KB 64bit IO Format:%I64d & %I64u
Submit

Status
Description
一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？
Input
输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。
Output
输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。
Sample Input
abcde a3
aaaaaa aa
#
Sample Output
0
3


D:

#include<stdio.h>
#include<string>
#include<cstring>
#include<queue>
#include<algorithm>
#include<functional>
#include<vector>
#include<iomanip>
#include<math.h>
#include<iostream>
#include<sstream>
#include<stack>
#include<set>
using namespace std;
const int MAX=1005;
char S[MAX],P[MAX];
int Next[MAX];
int main()
{
    cin.sync_with_stdio(false);
    while (cin>>(S+1)&&S[1]!='#')
    {
        cin>>(P+1);
        int n=strlen(S+1);
        int m=strlen(P+1);
        memset(Next,0,sizeof(Next));Next[0]=-1;
        for (int i=1;i<=m;i++)
        {
            int p=Next[i-1];
            while (p>=0&&P[p+1]!=P[i]) p=Next[p];
            Next[i]=p+1;
        }
        int ans=0;
        for (int i=1,p=0;i<=n;i++)
        {
            while (p>=0&&P[p+1]!=S[i]) p=Next[p];
            if (++p==m) {ans++;p=0;}
        }
        cout<<ans<<endl;
    }
    return 0;
}

E - 周期-循环
Time Limit:1000MS Memory Limit:32768KB 64bit IO Format:%I64d & %I64u
Submit

Status
Description
For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A K , that is A concatenated K times, for some string A. Of course, we also want to know the period K.
Input
The input file consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S. The second line contains the string S. The input file ends with a line, having the number zero on it.
Output
For each test case, output “Test case #” and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case.
Sample Input
3
aaa
12
aabaabaabaab
0
Sample Output
Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4


E: 同C题 长度从1开始枚举到N

#include<stdio.h>
#include<string>
#include<cstring>
#include<queue>
#include<algorithm>
#include<functional>
#include<vector>
#include<iomanip>
#include<math.h>
#include<iostream>
#include<sstream>
#include<stack>
#include<set>
using namespace std;
const int MAX=1000005;
char S[MAX];
int N,Next[MAX];
int main()
{
    cin.sync_with_stdio(false);
    int cases=0;
    while (cin>>N&&N)
    {
        cin>>(S+1);
        memset(Next,0,sizeof(Next));Next[0]=-1;
        for (int i=1;i<=N;i++)
        {
            int p=Next[i-1];
            while (p>=0&&S[p+1]!=S[i]) p=Next[p];
            Next[i]=p+1;
        }
        cout<<"Test case #"<<++cases<<endl;
        for (int i=1;i<=N;i++)
        {
            int K=(i%(i-Next[i])==0)?(i/(i-Next[i])):1;
            if (K!=1)
                cout<<i<<' '<<K<<endl;
        }
        cout<<endl;
    }
    return 0;
}

F - Interesting
Time Limit:1000MS Memory Limit:65535KB 64bit IO Format:%I64d & %I64u
Submit

Status
Description
The shortest common superstring of 2 strings S 1 and S 2 is a string S with the minimum number of characters which contains both S 1 and S 2 as a sequence of consecutive characters. For instance, the shortest common superstring of “alba” and “bacau” is “albacau”.
Given two strings composed of lowercase English characters, find the length of their shortest common superstring.
Input
The first line of input contains an integer number T, representing the number of test cases to follow. Each test case consists of 2 lines. The first of these lines contains the string S 1 and the second line contains the string S 2. Both of these strings contain at least 1 and at most 1.000.000 characters.
Output
For each of the T test cases, in the order given in the input, print one line containing the length of the shortest common superstring.
Sample Input
2
alba
bacau
resita
mures
Sample Output
7
8


F:


#include<stdio.h>
#include<string>
#include<cstring>
#include<queue>
#include<algorithm>
#include<functional>
#include<vector>
#include<iomanip>
#include<math.h>
#include<iostream>
#include<sstream>
#include<stack>
#include<set>
using namespace std;
const int MAX=1000006;
char A[MAX],B[MAX];
int Next[MAX],T;
void getNext(char *P)
{
    int lenP=strlen(P+1);
    memset(Next,0,sizeof(Next));
    Next[0]=-1;
    for (int i=1; i<=lenP; i++)
    {
        int p=Next[i-1];
        while (p>=0&&P[p+1]!=P[i]) p=Next[p];
        Next[i]=p+1;
    }
}
int kmp(char *P,char *S)
{
    getNext(P);
    int lenP=strlen(P+1);
    int lenS=strlen(S+1);
    int i=1,p=0;
    while (i<=lenS&&p<lenP)
    {
        if (p==-1||P[p+1]==S[i]) i++,p++;
        else p=Next[p];
    }
    return p;
}
int main()
{
    cin.sync_with_stdio(false);
    cin>>T;
    while (T--)
    {
        cin>>(A+1)>>(B+1);
        int lenA=strlen(A+1);
        int lenB=strlen(B+1);
        int x1=kmp(A,B);
        int x2=kmp(B,A);
        cout<<lenA+lenB-max(x1,x2)<<endl;
    }
    return 0;
}

G - 结合dp试试orz
Time Limit:1000MS Memory Limit:65536KB 64bit IO Format:%I64d & %I64u
Submit

Status
Description
As is known to all, in many cases, a word has two meanings. Such as “hehe”, which not only means “hehe”, but also means “excuse me”.
Today, ?? is chating with MeiZi online, MeiZi sends a sentence A to ??. ?? is so smart that he knows the word B in the sentence has two meanings. He wants to know how many kinds of meanings MeiZi can express.
Input
The first line of the input gives the number of test cases T; T test cases follow.
Each test case contains two strings A and B, A means the sentence MeiZi sends to ??, B means the word B which has two menaings. string only contains lowercase letters.

Limits
T <= 30
|A| <= 100000
|B| <= |A|

Output
For each test case, output one line containing “Case #x: y” (without quotes) , where x is the test case number (starting from 1) and y is the number of the different meaning of this sentence may be. Since this number may be quite large, you should output the answer modulo 1000000007.
Sample Input
4
hehehe
hehe
woquxizaolehehe
woquxizaole
hehehehe
hehe
owoadiuhzgneninougur
iehiehieh
Sample Output
Case #1: 3
Case #2: 2
Case #3: 5
Case #4: 1

Hint

In the first case, “ hehehe” can have 3 meaings: “he”, “he”, “hehehe”.
In the third case, “hehehehe” can have 5 meaings: “hehe”, “he*he”, “hehe”, “**”, “hehehehe”.



G:  令F[i]表示到i结尾的字符串可以表示的不同含义数
首先F[ i ] = F[ i - 1] ；不管 i 位置匹配不匹配，它都应该继承前一位置的总数。
然后对于第i个位置，如果它从i-m到 第i-1个位置正好和S串匹配，则F[ i ] += F[ i -m ] ;
因为从i-m到i-1长度为m的这个串，如果它使用了第二个含意，则i-m到i-1这些位置都不能再变了，所以是加上第i-m个位置上的总数；


#include<stdio.h>
#include<string>
#include<cstring>
#include<queue>
#include<algorithm>
#include<functional>
#include<vector>
#include<iomanip>
#include<math.h>
#include<iostream>
#include<sstream>
#include<stack>
#include<set>
using namespace std;
const int MAX=100006;
const int MOD=1000000007;
char S[MAX],P[MAX];
int Next[MAX],T,lenS,lenP,Mark[MAX],F[MAX];
int main()
{
    cin.sync_with_stdio(false);
    cin>>T;
    for (int cases=1; cases<=T; cases++)
    {
        cin>>(S+1)>>(P+1);
        lenS=strlen(S+1);
        lenP=strlen(P+1);
        memset(Next,0,sizeof(Next));
        memset(F,0,sizeof(F));
        Next[0]=-1;
        for (int i=1; i<=lenP; i++)
        {
            int p=Next[i-1];
            while (p>=0&&P[p+1]!=P[i]) p=Next[p];
            Next[i]=p+1;
        }
        memset(Mark,0,sizeof(Mark));
        for (int i=1,p=0; i<=lenS; i++)
        {
            while (p>=0&&P[p+1]!=S[i]) p=Next[p];
            if (++p==lenP)
            {
                Mark[i-lenP+1]=1;
                p=Next[p];
            }
        }
        for (int i=1;i<=lenS+1;i++)
        {
            F[i]=(F[i-1]+F[i])%MOD;
            if (Mark[i])
                F[i+lenP]=(F[i+lenP]+F[i]+1)%MOD;
        }
        cout<<"Case #"<<cases<<": "<<F[lenS+1]+1<<endl;
    }
    return 0;
}

Katapeltes

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
暑假训练-训练8.5 KMP

A - 字符串初步 Time Limit:1000MS Memory Limit:65536KB 64bit IO Format:%lld & %llu SubmitStatus Description The Genographic Project is a research partnership between IBM and The National Geograph
复制链接

扫一扫