一直对KMP有一种朦朦胧胧的感觉。今天看了几篇博文,得以理解。
http://www.cnblogs.com/yjiyjige/p/3263858.html
https://blog.csdn.net/yutianzuijin/article/details/11954939/
https://www.cnblogs.com/tangzhengyue/p/4315393.html
在KMP算法之前是朴素查找。
//假定主串和子串都不为空
//成功返回子串在主串中第一次出现的下标,失败返回-1
int searchSub(const char *str,const char *sub)
{
int i = 0;
int j = 0;
while(str[i] != '\0' && sub[j] != '\0')
{
if(str[i] == sub[j])
{
++i;
++j;
}
else
{
i = i-j+1;
j = 0;
}
}
if(sub[j] == '\0')
return i-j;
else
return -1;
}
KMP算法
KMP算法是对朴素的改进,朴素算法中,当子串sub[i]和主串sub[j]失配时,主串需要回退到i-j+1的位置以重新匹配,KMP算法的改进在于,它不回退主串的i,只对子串的j进行修改,因为在失配之前的i-1个主串已经和子串经过比较了,朴素算法正式忽略了这个问题。
//KMP算法通过next数组获得子串j回退的位置
int* getNext(const char *sub)
{
int len = strlen(sub);
int *next = new int[len];
int i = 0;
int j = -1;
next[0] = -1;
while (i < len)
{
if (j == -1 || sub[i] == sub[j])
next[++i] = ++j;
else
j = next[j];
}
return next;
}
一个例子:
“ABAB”的next数组为{-1,0,0,1}
当str[i]和sub[j]失配后j回退到1,sub[1]等于sub[j]等于B
所以回退到1并没有任何的意义,所以直接回退到0.
//优化后的next数组
int* getNextval(const char *sub)
{
int len = strlen(sub);
int *next = new int[len];
int i = 0;
int j = -1;
next[0] = -1;
while (i < len)
{
if (j == -1 || sub[i] == sub[j])
{//aaaaaaab
if (sub[i+1] == sub[j+1])
next[++i] = next[++j];
else
next[++i] = ++j;
}
else
j = next[j];
}
return next;
}
//KMP
int KMP(const char *str,const char *sub)
{
int i = 0;
int j = 0;
int len = strlen(sub);
int *next = new int[len+1];
if(next == NULL)
exit(1);
memset(next,0,sizeof(int)*(len+1));
Good_getNext(sub,next,len);
while(str[i] != '\0' && sub[j] != '\0')
{
if(j == -1 || str[i] == sub[j])
{
++i;
++j;
}
else
j = next[j];
}
if(sub[j] == '\0')
{
delete []next;
return i-j;
}
else
{
delete []next;
return -1;
}
}