KMP心得

首先计算出KMP针对于pattern的overlap表。表长=pattern+1 (利用首位-1)。每一个index对应的值表示length的substring的overlap的长度。overlap is defined as largest common length of proper prefix and suffix. proper means not including themselves.

ex: abab ---> -1 001(a)2(ab) Note first index always be -1.

original: *********|||||||************

pattern:             |||||||||||||| (match part shown as green)

pattern:             |||||||||||||| (red part indicates the overlap)

next check:             ||||||||||||| (next will be starting at blue part. just the one prior to index of NOT MATCHING - corresponding overlap )

This is to avoid unnecessary comparison since we can only get possible match by looking into next common string ,which is represented as prefix of pattern. 

============

How to get overlap table?

d  b  c  d  b  d  a  b  c

-1 *  *   *  *   *  *   *  *   *

runner -- runner for iterating each char in original string; border --lengthof symmetric structure that have common proper prefix and suffix. (之所以是length是因为巧妙用了starting point 的-1) length of overlap table = length of original string + 1

while (runner not reaching the end){
	// if we cannot extend the common part, find most conservative border, reuse the same part
	while (border >= 0 and string.charAt(runner) ~=string.charAt(border) ){
		go back to check when border = overlap[border]; 
		// 上面的例子,db首尾相等。overlap表d的位置=2,因为ac不等,不能+1,回看overlap[2]的值,查string.charAt(runner) 即d和 strin		g.charAt(newborder), which isstring.charAt(0)是否相等,相等, overlap对应的index的值 = 回溯的overlap table index +1
	}
// if equals, we can extend the symmetric structure
	runner ++;
	border ++;
	overlap[runner] = border;
// because we initialize overlap[0] = -1, every update after that starting from index 1. totally length of original string times update. use offset-by-1 structure. 
}


leetcode: 

Implement strStr() 

Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack

public class Solution {
    // use KMP algorithm
    public String strStr(String haystack, String needle) {
        if (needle.length() == 0 )
            return haystack;   
        if (haystack.length() < needle.length())
            return null;
        int hpointer = 0, npointer =0;
        int[] overlap = preprocess(needle);
        while (hpointer < haystack.length()){
            // if we mismatach, how many steps we could jump over (get back to the location that we may have match character)
            while(npointer >= 0 && haystack.charAt(hpointer)!= needle.charAt(npointer)){
                npointer = overlap[npointer];
            }
            // if we have match, both pointer advances by 1 step
            npointer ++;
            hpointer ++;
            
            if(npointer == needle.length()) break;
        }
        if(npointer == needle.length()) return haystack.substring(hpointer-needle.length());
        else return null;
    }
    
    public int[] preprocess (String pattern){
        int[] res = new int[pattern.length()+1];
        int runner = 0, border = -1;
        res[0] = -1; 
        while(runner < pattern.length()){
            // check if we can extend the symmetric structure, if not until the location that we are able to do
            while(border >= 0 && pattern.charAt(runner)!= pattern.charAt(border)){
                border = res[border];
            }
            // if we can extend or we track back to the beginning of overlap array
            border++;
            runner++;
            res[runner] = border;
        }
        return res;
    }
}







ref:

1. http://tekmarathon.wordpress.com/2013/05/14/algorithm-to-find-substring-in-a-string-kmp-algorithm/

2. http://www.youtube.com/watch?v=1k2KDhcO_uo




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值