KMP总结

最新推荐文章于 2024-01-07 14:08:54 发布

进击的木头

最新推荐文章于 2024-01-07 14:08:54 发布

阅读量300

点赞数

分类专栏： KMP 文章标签： KMP

本文链接：https://blog.csdn.net/qq_40510246/article/details/81120638

版权

KMP 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

本来上次写了一篇的，但是浏览器突然崩溃，闪退出去，，，写得东西都没了，，我，，，XXX，这次重写。

首先推荐一篇很好的KMP入门博客：https://blog.csdn.net/v_july_v/article/details/7041827

上周学习KMP的时候，确实有段时间很迷，因为我把两个不同版本的next数组弄混了，诶，上面这个链接里面有关next数组的东西讲的都很详细，有兴趣的可以去看看。这篇博客我分几个题型来讲，希望能对有些人有帮助，同时欢迎大家指正，谢谢。

1.裸KMP

Number Sequence

Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 37702 Accepted Submission(s): 15618

Problem Description

Given two sequences of numbers : a[1], a[2], ...... , a[N], and b[1], b[2], ...... , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], ...... , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.

Input

The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], ...... , a[N]. The third line contains M integers which indicate b[1], b[2], ...... , b[M]. All integers are in the range of [-1000000, 1000000].

Output

For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.

Sample Input

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 1 3

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 2 1

Sample Output

-1

Source

HDU 2007-Spring Programming Contest

这道题就是一道裸kmp，直接套板子就好了，没什么好说的。代码如下：


#include <iostream>
#include <cstring>
#include<algorithm>
#include<cstdio>
using namespace std;
const int N=1e6+10;
int nn,m,n;
int Next[N];
int a[N],b[N];
void getnext()
{
    int j=-1;
    Next[0]=-1;
    for(int i=1; i<m; i++)
    {
        while(j!=-1&&b[i]!=b[j+1])
        {
            j=Next[j];
        }
        if(b[i]==b[j+1])
            j++;/*j相当于next[i-1]*/
        if(j==-1||b[i+1]!=b[j+1])
            Next[i]=j;
        else
            Next[i]=Next[j];
    }
}
void KMP()
{
    getnext();
    int j=-1;
    int kkk=0;
    for(int i=0; i<n; i++)
    {
        while(j!=-1&&a[i]!=b[j+1])
        {
            j=Next[j];
        }
        if(a[i]==b[j+1])
            j++;
        if(m-1==j)
        {
            kkk=i-m+2;
            break;
        }
    }
    if(kkk)
        cout<<kkk<<endl;
    else
        cout<<-1<<endl;;
}
int main()
{
    cin>>nn;
    while(nn--)
    {
        memset(Next,0,sizeof(Next));
    scanf("%d%d",&n,&m);
        for(int i=0; i<n; i++)
            scanf("%d",&a[i]);
        for(int i=0; i<m; i++)
            scanf("%d",&b[i]);
        KMP();
    }
}

2.统计模式串的KMP

剪花布条

Time Limit: 1000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 3083 Accepted Submission(s): 2079

Problem Description

一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？

Input

输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。

Output

输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。

Sample Input

abcde a3

aaaaaa aa

Sample Output

0 3

这道题相对于裸KMP来说，就是多了一段计数的代码，不过这道题里面每次计数完next数组都要归零，因为这里面的重复子串不叠加。代码如下：

#include <iostream>
#include <cstring>
#include<algorithm>
#include<cstdio>
using namespace std;
const int N=1000+10;
int nn,m,n;
int Next[N];
char a[N],b[N];
void getnext(char s[],int l)
{
    int j=-1,i=0;
    Next[0]=-1;
    while(i<l)
    {
        if(j==-1||s[i]==s[j+1])
        {
            i++;
            j++;
            Next[i]=j;
        }
        else
            j=Next[j];
    }
}
int kmp(char a[],char b[])
{
    int la=strlen(a),lb=strlen(b);
    getnext(b,lb);
    int j=0,ans=0,i=0;
   while(i<la)
   {
       if(j==-1||a[i]==b[j])
       {
           ++i;
           ++j;
       }
       else
       {
           j=Next[j];
       }
       if(j==lb)
       {
        ans++;
        j=0;
       }
   }
    return ans;
}
int main()
{
    while(scanf("%s",a))
    {
        if(a[0]=='#')
            break;
        scanf("%s",b);
        cout<<kmp(a,b)<<endl;
    }
}

3.有循环节的KMP

Cyclic Nacklace

Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 7375 Accepted Submission(s): 3210

Problem Description

CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of "HDU CakeMan", he wants to sell some little things to make money. Of course, this is not an easy task.

As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl's fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls' lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet's cycle is 9 and its cyclic count is 2:

Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.

Input

The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by 'a' ~'z' characters. The length of the string Len: ( 3 <= Len <= 100000 ).

Output

For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.

Sample Input

aaa

abca

abcde

Sample Output

0 2 5

做这种题之前，首先要知道一个公式，最小循环节长度L=len-next[len],这里的len是模式串的长度，emmmm，因为next[len]表示的是，第len个字符前的最大前后缀相等的长度，所以用len减去这部分的时候得到的就是我们要的东西，emmmm可能解释的不清楚，大家可以自己推一下，很简单的。而这道题呢，就是我们求需要加几个字符才能得到完整的循环节，所以嘛，就按照刚才所说求就好了，如果最小循环节长度是模式串长度的因数的话，就不用添加字符，如果相等的话，就要添加len个字符，如果都不是，就要添加len-L个字符了。代码如下：

#include<cstdio>
#include<cstring>
#include<algorithm>
using namespace std;
char s[100010];
int len,t,nxt[100010];
void kmp(int len)
{
    int i=0,j=-1;
    nxt[0]=-1;
    while(i<len)
    {
       if(j==-1||s[i]==s[j])
       {
           i++;
           j++;
           nxt[i]=j;
       }
       else
        j=nxt[j];
    }
}
int main()
{
    int i,j;
    scanf("%d",&t);
    for (i=1; i<=t; ++i)
    {
        int sum;
        scanf("%s",s);
        len=strlen(s);
        kmp(len);
        sum=len-nxt[len];//最小循环节
//        printf("%d\n",sum);
        if (sum==len) printf("%d\n",len);
        else if (!(len%sum)) printf("0\n");
        else printf("%d\n",sum-len%sum);
    }
    return 0;
}

4.暴力枚举+KMP

Blue Jeans

Time Limit: 1000MS		Memory Limit: 65536K
Total Submissions: 17146		Accepted: 7602

Description

The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated.

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.

Input

Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:

A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
m lines each containing a single base sequence consisting of 60 bases.

Output

For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.

Sample Input

3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Sample Output

no significant commonalities
AGATAC
CATCATCAT

这道题的题意就是在多个字符串里面找到一个相同的最大的子串来，以第一个字符串为模式串，因为数据不是很大，所以就枚举第一个字符串的所有子串与其他的字符串匹配，详细的可以看代码。

代码如下：

#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
using namespace std;
const int N=205;
int Next[N];
char s[4005][205];
char ans[205];
char zichuan[205];
void getnext(char s[])
{
    int len=strlen(s);
    int i=0,j=-1;
    Next[i]=-1;
    while(i<len)
    {
        if(j==-1||s[i]==s[j])
        {
            i++;
            j++;
            Next[i]=j;
        }
        else
            j=Next[j];
    }
}
bool kmp(char a[],char b[])
{
    int l1=strlen(a),l2=strlen(b);
    getnext(a);
    int j=0,i=0;
    while(i<l2)
    {
        if(j==-1||a[j]==b[i])
        {
            i++;
            j++;
        }
        else j=Next[j];
        if(j==l1)
            return true;
    }
    return false;
}
int main()
{
    int m,n;
    scanf("%d",&m);
    while(m--)
    {
        scanf("%d",&n);
        for(int i=0;i<n;i++)
            scanf("%s",s[i]);
        int len=strlen(s[0]);
        memset(ans,'\0',sizeof(ans));
        for(int i=0;i<len;i++)
        {
            int sum=0;
            for(int j=i;j<len;j++)
            {
                zichuan[sum]=s[0][j];
                sum++;
                zichuan[sum]='\0';
                int flag=1;
                for(int k=1;k<n;k++)
                {
                    if(!kmp(zichuan,s[k]))
                    {
                        flag=0;
                        break;
                    }
                }
                if(flag)
                {
                    if(strlen(zichuan)>strlen(ans))
                        strcpy(ans,zichuan);
                    else if(strlen(zichuan)==strlen(ans)&&strcmp(zichuan,ans)<0)
                        strcpy(ans,zichuan);
                }
            }
        }
        if(strlen(ans)>1)
            printf("%s\n",ans);
        else
            printf("no significant commonalities\n");
    }
}

ps：以上只不过是KMP的一小部分，而我才疏学浅，写得主要是些上不得台面的东西，望各位看官多多见谅。Thanks。