KMP

KMP Substring Search

KMP was used to find whether string "A" is including in another string "B", we always name the first string "A" "pattern" and the second string "B" "text".

Generally speaking, we usually find a substring in the main string in this easy way:

  • First, we start from index 0 of text and index 0 of pattern, and we compare the first char of the two strings.
  • Then, if the first char of the two strings match, we compare the next char of them.
  • If the two first char not match, we compare the first char of the pattern string and the next char of text.
  • And then, repeat these operations.

But this algorithm takes O(mn) time, and the "n" is the length of the pattern and the "m" is the length of the text.

KMP search is better than this algorithm, it takes O(m+n) time. Then I will introduce the better one.

KMP is a way to find a substring in another string, and return the index of the substring in the main string, or not find it, it will return -1 or exception.

Now, let's learn KMP how to work.

  • The first step is the same as the common way, we start from index 0 of text and index 0 of pattern, and we compare the first char of the two strings.
  • If we find a char not match, we will compare the prefix and the suffix of the pattern, if they are the same then we will compare the next char of the prefix of the pattern with the different char in the main string. Such as this case:

We can find that index 7 char of pattern and index 7 char of text not match. but the prefix is the same as the suffix of they have the same substring "ABC", so index 0~2 of the pattern is the same as index 4~7 of the main string. Then we can only compare index 3 of the pattern with index 8 of the main string. So by using we can save unnecessary steps.

  • And then, repeat these operations.

Then I will introduce how to use it in C/C++.

  • First, we need to build an array to record the same prefix and suffix of the pattern string. we need to create two variables "i" and "j",  and we put "j" in index 0 of pattern and put "i" in index 1 of the pattern. Then we compare index "i" char and index "j" char in turn if they are the same, fill the array a[i] with j+1, and let both "i" and "j" plus 1, if not, fill the array a[i] with 0, and let j=0 but i = i+1. Then we will take O(n) time to build this array.
  • Now, we get the array which records the situation of the same prefix and suffix of the pattern string. We can begin to compare the two strings. If there are two first char not match, we compare the index array[pattern[i]-1] char of the pattern string and the next char of text. Then we will take O(m) time to finish the comparing work.

This is KMP.

Let's take a look at the code of KMP.

This is mine:

#include<iostream>
#include<cstring>
using namespace std;
int patt[1000];
int KMP(string a,int m,string b,int n)
{
    int j = 0;
    memset(patt,0,sizeof(patt));
    for(int i=1;i<m;i++)
    {

        if(a[i] == a[j])
        {
            j++;
            patt[i] = j+1;
        }
        else
        {
            while(j!=0)
            {
                j = patt[j-1];
            }
            if(j==0&&a[i] != a[j])
                patt[i] = 0;
        }
    }
    j = 0;
    int flag = j;
    for(int i=0;i<n;i++)
    {
        if(a[j] == b[i])
        {
            j++;
            if(j==m)
                break;
        }

        else
        {
            j = patt[j-1];
            flag = j;
        }
    }
    if(flag == j)
        flag = -1;
    return flag;
}

This is a copy from here.

#include <iostream>
#include <string>
#include <vector>
using namespace std;

void get_next(string T,vector<int> &next){
	next.resize(T.size());
	int i=1,j=0;
	next[0] =1;
	while (i<T.length()-1){
		if(j==0 || T[i]==T[j])	{ ++i; ++j; next[i]=j; }
		else	j=next[j];//这里又是一个递归,使用的是已经求出来的next[]数组。
	}
}
int Index(string S,string T,vector<int> &next){
	int i=0,j=0;
	while(i<S.length() && j<T.length()){
		if(j==0 || S[i]==T[j])	{ i++; j++;}
		else j=next[j];
	}
	if (j==T.length())	{return i-T.length();}
	else	return 0;
}

And this is the best one:

#include <stdio.h>
#include <string.h>
void Next(const char*T,int *next){
    int i=1;
    next[1]=0;
    int j=0;
    while (i<strlen(T)) {
        if (j==0||T[i-1]==T[j-1]) {
            i++;
            j++;
            if (T[i-1]!=T[j-1]) {
               next[i]=j;
            }
            else{
                next[i]=next[j];
            }
        }else{
            j=next[j];
        }
    }
}
int KMP(const char * S,const char * T){
    int next[10];
    Next(T,next);//根据模式串T,初始化next数组
    int i=1;
    int j=1;
    while (i<=strlen(S)&&j<=strlen(T)) {
        //j==0:代表模式串的第一个字符就和当前测试的字符不相等;S[i-1]==T[j-1],如果对应位置字符相等,两种情况下,指向当前测试的两个指针下标i和j都向后移
        if (j==0 || S[i-1]==T[j-1]) {
            i++;
            j++;
        }
        else{
            j=next[j];//如果测试的两个字符不相等,i不动,j变为当前测试字符串的next值
        }
    }
    if (j>strlen(T)) {//如果条件为真,说明匹配成功
        return i-(int)strlen(T);
    }
    return -1;
}

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值