Knuth-Morris-Pratt算法
KMP: 单模式匹配, 判断s1是否是s2的子串
*** 是将学习了很多地方的KMP算法,整理出来的笔记移到csdn博客上,因为没有记录原来参考的文章,所以不能提供引用的链接了.sorry.
原理:
通过一个辅助函数next(),实现跳过不必要的目标字符串,已达到优化效果
时间复杂度: O(m+n)
主要思想:
在失配之后,并不简单的从目标串的下一个字符开始新一轮的检测, 而是依据在检测之前得到的有用信息,直接跳过必要的检测
* 有用信息: 前缀函数next
let P = 已经匹配的字符串 exp. ababa, P = 5
L = len(特殊字符串), 指即使自身真后缀(不等于自己), 又是自身最长前缀的字符串, for ababa, 特殊字符串 = aba, L = 3
则有效位移 S = P - L = 5-3 = 2,
这里有很详细的讲述kmp算法的例子(但文章的next数组初始化为0,而本文的初始化为-1,不过不影响理解,只是使用next时,有区别而已)
http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html
代码:
KMP的核心: 获得记录跳转状态的next数组
// exp: "cdf": [-1, -1, -1]
// exp: "ababa": [-1, -1, 0, 1, 2]
public int[] next(String sub){
int[] a = new int[sub.length()];
char[] c = sub.toCharArray();
int i = 0, j;
// initial the first bit, start from the second
a[0] = -1;
for(j = 1; j < sub.length(); j++){
i = a[j-1];
while(i >= 0 && c[j]!= c[i+1])
i = a[i];
if(c[j] == c[i+1])
a[j] = i+1;
else
a[j] = -1;
}
return a;
}
匹配方法:
public int pattern(String str, String sub){
int[] next = next(sub);
char[] ch1 = str.toCharArray();
char[] ch2 = str.toCharArray();
int i = 0, j = 0; // i->ch1, j->ch2
for(;i < ch1.length;){
// if there is a match
if(ch1[i] == ch2[j]){
if(j == ch2.length - 1){
return (i - ch2.length + 1);
}
i++;
j++;
}else if(j = 0){
// the first char of the target is a dismatch
i++;
}else{
// jump some already parsed chars
// ch1[i] still need to be checked with ch2[j_new]
j = next[j-1] + 1;
}
}
return -1;
}