【HDU1686】Oulipo 思路+解题报告+代码+KMP算法个人理解【0.5%达成】

本文深入讲解了KMP算法的工作原理及应用，通过实例演示如何高效地解决字符串匹配问题，特别适用于处理大规模数据集。

HDU的题意就是，给你一个字符串A，一个字符串B，求A在B中总共出现了几次，注意，重复的也算。

比如说

str1 = "ABA"

str2 = "ABABABA"

这样的话，那么str1就在str2中出现了三次。

当然，按照HDU一贯淫荡的套路，朴素算法肯定会超时。

Thanks to lpp学长那个秒杀级的样例。。。

——————————注意以下内容纯属个人理解如果有误【十分欢迎，极度渴望】批评指正————————

朴素算法的时间复杂度是O(mn)，其中m=10^4,n=10^6,m*n=10^10，必然会超时，至于朴素算法的时间复杂度证明，可以见CLRS（《算法导论》，黑话）中的证明，简单易懂。

我们用KMP算法的话，时间复杂度是O(m+n)=10^6+10^4 约等于 10^6，没关系。

首先我们介绍KMP算法。为了不耽误各位下（哔）片的时间和流量，这里关于KMP算法是怎么来的忽略不算了。KMP算法弥补了朴素算法的缺陷

首先，说一下next[i]，也就是各大算法书中由于不知道出于什么心理（可能是把人家弄迷糊？）而叫做“失配函数”的东西，是干嘛的。

关于失配函数的计算。

假设我们有str2要与str1串匹配，那么str2我们就将其叫做模式串。

假设模式串是ABCABDABCABE

那么我们就要找到一个next[i]，使得str2[i]满足以下条件

str2[0..next[i]]与str2[i-next[i]..i](str2[0..N]表示str2串中第0位到第N位形成的字符串）相同。

如果不存在这样的i，那么next[i]=-1.

如果习惯了术语说法的话，那么next[i]的值要保证str2[i-next[i]..i]是str2[0..next[i]]的最长公共前缀

str2[]:

A B C A B D A B C A B E

next[]:

-1 -1 -1 0 1 -1 0 1 2 3 4 -1

注意next[]中加粗的部分，【为什么这部分不是3,4】，自己想一下吧~（还记得吗？str[0..next[i])与str2[i-next[i]..i]的最长公共前缀）

由此，我们得到了计算next[]的算法：

void initNext()
{
    int len = strlen(word);
    next[0] = -1;
    for(int j = 1 ; j < len ; j++)
    {
        int i = next[j-1];

        while(  (i>=0) && ( word[i+1] != word[j] ) )
        {
            i = next[i];
        }
        if( word[i+1] == word[j] )
        {
            next[j] = i+1;
        }
        else
            next[j] = -1;
    }
    /*
    for(int j = 1 ; j < len ; j++)
    {
        printf("%d ",next[j]);
    } */
}

在上述代码中:

        int i = next[j-1];  //这一行是得到str2[j]的前一个字符的next[]值

        while(  (i>=0) && ( word[i+1] != word[j] ) )
        {
            i = next[i]; /// 当i>=0并且str[ next[j-1]+1 ] 不相等的时候，i不断向前回溯，直到i=-1或者word[i+1]=word[j]，可以手动求一下ABCDABCEABCF~有惊喜
        }////牢记next[i]的作用是让str2[0..next[i]]与str2[i-next[i]..i]相等！
        if( word[i+1] == word[j] )  //这个就好理解吧
        {
            next[j] = i+1;
        }
        else
            next[j] = -1;

这样就计算出了next[]，由于for循环最多也就加str2len次，时间复杂度为O(m),m为str2的长度。

那么，KMP算法比较时候，按照这道题，应该是

int solve()
{
    int cnt = 0;
    int i = 0 ;
    int j = 0 ;
    int lenp = strlen(word);
    int lens = strlen(text);

    while( i <= lens  )
    {
        if( i == lens )
        {
            if( j == lenp )
                cnt++;
            break;   //防止a串是ABC，b串是ABC的情况
        }
        if( text[i] == word[j] )
        {
            i++;j++;
        }
        else
        {
            if( j == 0 )
                i++;  //如果不相同且j=0，那么说明第一位就不匹配，j-1=-1，就越界了，此时i++就可以了
            else
                j = next[j-1]+1;  //如果相同，那么与next[j-1]+1匹配，也就是与str2[]开头的第next[j-1]+1匹配

//                                  因为str2[0..next[j-1]]与str2[j-next[j-1]-1,j-1]完全相同
        
        if( j >= lenp )  //好吧这里才是解题的关键嗯，如果j>=lenp(attern)，那么就说明找到了j匹配，为了保证重复的串也被计算，那么我们要与next[lenp-1]+1处比较，因为str2[0..next[lenp-1]与str2[lenp-1-next[lenp-1],lenp-1]完全相同。
        {
            cnt++;
            j = next[lenp-1]+1;
        }
    }
    return cnt;
}

完整AC代码如下：

#include <iostream>
#include <cstdlib>
#include <cstdio>
#include <cstring>
#define ONLINE_JUDGE
using namespace std;

const int MAX_SIZE_1 = 10010;
const int MAX_SIZE_2 = 1000010;
char word[MAX_SIZE_1];
char text[MAX_SIZE_2];

int next[MAX_SIZE_1];

void initNext()
{
    int len = strlen(word);
    next[0] = -1;
    for(int j = 1 ; j < len ; j++)
    {
        int i = next[j-1];

        while(  (i>=0) && ( word[i+1] != word[j] ) )
        {
            i = next[i];
        }
        if( word[i+1] == word[j] )
        {
            next[j] = i+1;
        }
        else
            next[j] = -1;
    }
    /*
    for(int j = 1 ; j < len ; j++)
    {
        printf("%d ",next[j]);
    } */
}
int solve()
{
    int cnt = 0;
    int i = 0 ;
    int j = 0 ;
    int lenp = strlen(word);
    int lens = strlen(text);

    while( i <= lens  )
    {
        if( i == lens )
        {
            if( j == lenp )
                cnt++;
            break;
        }
        if( text[i] == word[j] )
        {
            i++;j++;
        }
        else
        {
            if( j == 0 )
                i++;
            else
                j = next[j-1]+1;
        }
        if( j >= lenp )
        {
            cnt++;
            j = next[lenp-1]+1;
        }
    }
    return cnt;
}
int main()
{
#ifndef ONLINE_JUDGE
    freopen("B:\\acm\\SummerVacation\\String-I\\A.in","r",stdin);
    freopen("B:\\acm\\SummerVacation\\String-I\\A.out","w",stdout);
#endif
    int T;
    while(scanf("%d\n",&T)!=EOF)
    {
        for(int t = 0 ; t < T ; t++)
        {
            memset(word,0,sizeof(word));
            memset(text,0,sizeof(text));
            memset(next,0,sizeof(next));


            gets(word);
            gets(text);

            initNext();
            int ans = 0;

            ans = solve();

            printf("%d\n",ans);
        }
    }
#ifndef ONLINE_JUDGE
    fclose(stdin);
    fclose(stdout);
#endif
    return 0;
}