首先计算出KMP针对于pattern的overlap表。表长=pattern+1 (利用首位-1)。每一个index对应的值表示length的substring的overlap的长度。overlap is defined as largest common length of proper prefix and suffix. proper means not including themselves.
ex: abab ---> -1 001(a)2(ab) Note first index always be -1.
original: *********|||||||************
pattern: |||||||||||||| (match part shown as green)
pattern: |||||||||||||| (red part indicates the overlap)
next check: ||||||||||||| (next will be starting at blue part. just the one prior to index of NOT MATCHING - corresponding overlap )
This is to avoid unnecessary comparison since we can only get possible match by looking into next common string ,which is represented as prefix of pattern.
============
How to get overlap table?
d b c d b d a b c
-1 * * * * * * * * *
runner -- runner for iterating each char in original string; border --lengthof symmetric structure that have common proper prefix and suffix. (之所以是length是因为巧妙用了starting point 的-1) length of overlap table = length of original string + 1
while (runner not reaching the end){
// if we cannot extend the common part, find most conservative border, reuse the same part
while (border >= 0 and string.charAt(runner) ~=string.charAt(border) ){
go back to check when border = overlap[border];
// 上面的例子,db首尾相等。overlap表d的位置=2,因为ac不等,不能+1,回看overlap[2]的值,查string.charAt(runner) 即d和 strin g.charAt(newborder), which isstring.charAt(0)是否相等,相等, overlap对应的index的值 = 回溯的overlap table index +1
}
// if equals, we can extend the symmetric structure
runner ++;
border ++;
overlap[runner] = border;
// because we initialize overlap[0] = -1, every update after that starting from index 1. totally length of original string times update. use offset-by-1 structure.
}
leetcode:
Implement strStr()
Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack
public class Solution {
// use KMP algorithm
public String strStr(String haystack, String needle) {
if (needle.length() == 0 )
return haystack;
if (haystack.length() < needle.length())
return null;
int hpointer = 0, npointer =0;
int[] overlap = preprocess(needle);
while (hpointer < haystack.length()){
// if we mismatach, how many steps we could jump over (get back to the location that we may have match character)
while(npointer >= 0 && haystack.charAt(hpointer)!= needle.charAt(npointer)){
npointer = overlap[npointer];
}
// if we have match, both pointer advances by 1 step
npointer ++;
hpointer ++;
if(npointer == needle.length()) break;
}
if(npointer == needle.length()) return haystack.substring(hpointer-needle.length());
else return null;
}
public int[] preprocess (String pattern){
int[] res = new int[pattern.length()+1];
int runner = 0, border = -1;
res[0] = -1;
while(runner < pattern.length()){
// check if we can extend the symmetric structure, if not until the location that we are able to do
while(border >= 0 && pattern.charAt(runner)!= pattern.charAt(border)){
border = res[border];
}
// if we can extend or we track back to the beginning of overlap array
border++;
runner++;
res[runner] = border;
}
return res;
}
}
ref:
1. http://tekmarathon.wordpress.com/2013/05/14/algorithm-to-find-substring-in-a-string-kmp-algorithm/
2. http://www.youtube.com/watch?v=1k2KDhcO_uo