拓展KMP

最新推荐文章于 2024-05-30 17:00:00 发布

逃夭丶

最新推荐文章于 2024-05-30 17:00:00 发布

阅读量127

点赞数

分类专栏： # 字符串

本文链接：https://blog.csdn.net/qq_43724031/article/details/102826298

版权

字符串专栏收录该内容

10 篇文章 0 订阅

订阅专栏

本文参考于：https://blog.csdn.net/Floraqiu/article/details/81558865

拓展KMP

拓展KMP是对KMP算法的拓展，它解决如下问题：
给定母串 S ，子串 T
定义 n=|S|，m=|T|，extend[i] = S[i…n-1] 与 T 的最长公共前缀长度，线性时间复杂度（O(n+m)）内，求出所有的 extend[1…n-1]。

容易发现，如果有某个位置 i 满足 extend[i] = m，那么 T 就肯定在 S 中出现过，并且进一步知道出现首位置是 i — 而这正是经典的KMP问题/
因此可见“拓展KMP问题”是对经典KMP问题的一个扩充和加难。

【算法】

首先我们从左到右以此计算 extend 数组，在某一时刻，设 extend[0…k] 已经计算完毕，并且之前匹配过程中所达到的最远距离为 P，所谓最远距离，严格来说就是 i+extend[i]-1 的最大值（0<=i<=k），并且设取这个最大值的位置为 P₀。
现在要计算 extend[k+1]，注意：这里推的是 k+1 的公式，而在代码中是写的 k 的公式，会差一个 1，根据 extend 数组的定义，可以推断出 S[P₀,P] = T[0,P-P₀]，从而得到 S[k+1,P] = T[k-P₀+1,P-P₀]，令 len = nxt[t-P₀+1]，（这里的 len 也就是可以从头开始匹配的字符长度），分为两种情况讨论：

当 k+len < P：
也就是说 T 字符串的 k-P₀ + 1 位置推断出的从 T 的0开头可以匹配的长度 len 并没有超过现有的 P 的长度，而 P₀ — P之间的字符是可以被匹配的这一事实我们已经检验过了，所以可以确保从 k+1 ~ k+len 的字符，确实可以从 T 头开始匹配 len 的长度
如下图所示：

上图中，S[k+1,k+len] = T[0,len-1]，然后 S[k+len+1] 一定不等于 T[len] ，因为如果它们相等，则有 S[k+1,k+len+1] = T[k+po+1,k+po+len+1] = T[0,len]，那么 nxt[k+P₀+1] = len+1，这和nxt 数组的定义不符（nxt[i] 表示 T[i,m-1] 和 T 的最长公共前缀长度），所以在这种情况下，不用进行任何匹配，就知道 extend[k+1]=len。
k+len >= P
也就是说从 T 字符串的 k - P₀ + 1 位置推断出的从 T 的0开头可以匹配的长度 len 已经超过了现有 P 的大小，而 P₀ - P之间的字符是可以被匹配的这一事实我们已经检验过，但超过 P 的部分我们并没有匹配过，所以不能确保从 k+1 ~ k+len 的字符是否可以从 T 头开始匹配 len 的长度，只能说至少可以确定 k+1 ~ p 是匹配的，而 p+1 ~ k+len 的部分还需要进一步比对。

上图中，S[P+1] 之后的字符都是未知的，也就是还未进行过匹配的字符串，所以在这种情况下，就要从 S[P+1] 和 T[P-k+1] 开始一一匹配，直到发生失配为止，当匹配完成后，如果得到的 extend[k+1]+(k+1) 大于 P 则要更新未知 P 和 P₀。(得到的 extend[k+1]+(k+1) 至少都是P，要么就比 P 还大，所以在更新完 extend 之后，直接让 P = extend 即可）。

至此。拓展KMP算法的过程已经描述完成，事实上，计算 nxt 数组的过程和计算 extend[i] 的过程完全一样，将它看成是以 T 为母串，T 为字串的特殊的拓展kmp算法匹配就可以了，计算过程中的next数组全是已经计算过的，所以按照上述介绍的算法计算next数组即可。

void get_nxt(int pl){
    nxt[0] = pl;
    int i;
    for(i=0;i<pl-1&&P[i]==P[i+1];i++);
    nxt[1] = i;
    int a = 1;
    for(int k = 2;k<pl;k++){
        int p = a + nxt[a]-1;
        int l = nxt[k-a];
        if((k-1)+l>=p){
            int j = (p-k+1)>0 ? (p-k+1) : 0;
            while(k+j<pl&&P[k+j]==P[j]) j++;
            nxt[k] = j, a = k;
        }
        else nxt[k] = l;
    }
}
void getextend(int sl,int pl){
    memset(nxt,-1,sizeof(nxt));
    get_nxt(pl);
    int a = 0;
    int minl = min(sl,pl);
    while(a<minl&&S[a]==P[a]) a++;
    extend[0] = a,a = 0;
    for(int k = 1;k<sl;k++){
      int p = a + extend[a]-1;
      int l = nxt[k-a];
      if((k-1)+l>=p){
           int j = (p-k+1)>0 ? (p-k+1) : 0;
           while(k+j<sl&&j<pl&&S[k+j]==P[j]) j++;
           extend[k] = j;
           a = k;
      }
      else extend[k] = l;
    }
}

HDU Simpsons’ Hidden Talents http://acm.hdu.edu.cn/showproblem.php?pid=2594
Problem Description

Homer: Marge, I just figured out a way to discover some of the talents we weren’t aware we had.
Marge: Yeah, what is it?
Homer: Take me for example. I want to find out if I have a talent in politics, OK?
Marge: OK.
Homer: So I take some politician’s name, say Clinton, and try to find the length of the longest prefix
in Clinton’s name that is a suffix in my name. That’s how close I am to being a politician like Clinton
Marge: Why on earth choose the longest prefix that is a suffix???
Homer: Well, our talents are deeply hidden within ourselves, Marge.
Marge: So how close are you?
Homer: 0!
Marge: I’m not surprised.
Homer: But you know, you must have some real math talent hidden deep in you.
Marge: How come?
Homer: Riemann and Marjorie gives 3!!!
Marge: Who the heck is Riemann?
Homer: Never mind.
Write a program that, when given strings s1 and s2, finds the longest prefix of s1 that is a suffix of s2.

Input

Input consists of two lines. The first line contains s1 and the second line contains s2. You may assume all letters are in lowercase.

Output

Output consists of a single line that contains the longest string that is a prefix of s1 and a suffix of s2, followed by the length of that prefix. If the longest such string is the empty string, then the output should be 0.
The lengths of s1 and s2 will be at most 50000.

Sample Input

clinton
homer
riemann
marjorie

Sample Output

0
rie 3

题意

给定两个字符串，在第一个字符串中找到一个最大前缀作为第二个字符串的后缀

#include<cstdio>
#include<iostream>
#include<cmath>
#include<algorithm>
#include<cstring>
#include<cstdlib>
#include<cctype>
#include<vector>
#include<stack>
#include<queue>
#include<ctime>
#include<map>
#define ll long long
#define ld long double
#define ull unsigned long long
using namespace std;
const int maxn = 1000101;
int nxt[maxn],extend[maxn];
char P[maxn],S[maxn];
void get_nxt(int pl){
    nxt[0] = pl;
    int i;
    for(i=0;i<pl-1&&P[i]==P[i+1];i++);
    nxt[1] = i;
    int a = 1;
    for(int k = 2;k<pl;k++){
        int p = a + nxt[a]-1;
        int l = nxt[k-a];
        if((k-1)+l>=p){
            int j = (p-k+1)>0 ? (p-k+1) : 0;
            while(k+j<pl&&P[k+j]==P[j]) j++;
            nxt[k] = j, a = k;
        }
        else nxt[k] = l;
    }
}
void getextend(int sl,int pl){
    memset(nxt,-1,sizeof(nxt));
    get_nxt(pl);
    int a = 0;
    int minl = min(sl,pl);
    while(a<minl&&S[a]==P[a]) a++;
    extend[0] = a,a = 0;
    for(int k = 1;k<sl;k++){
      int p = a + extend[a]-1;
      int l = nxt[k-a];
      if((k-1)+l>=p){
           int j = (p-k+1)>0 ? (p-k+1) : 0;
           while(k+j<sl&&j<pl&&S[k+j]==P[j]) j++;
           extend[k] = j;
           a = k;
      }
      else extend[k] = l;
    }
}
int main(void)
{
    while(~scanf("%s%s",P,S)){
        int pl = strlen(P);
        int sl = strlen(S);
        getextend(sl,pl);
        int ret = 0;
        for(int i=0;i<sl;i++)
            if(extend[i]>ret&&extend[i]+i==sl)
                ret = extend[i];
        if(!ret) puts("0");
        else{
            strncpy(P,S+(sl-ret),ret);
            P[ret] = '\0';
            printf("%s %d\n",P,ret);
        }
    }
    return 0;
}