KMP

最新推荐文章于 2020-10-25 15:15:00 发布

in_han

最新推荐文章于 2020-10-25 15:15:00 发布

阅读量598

点赞数

分类专栏：算法编程 python 数据结构与算法

本文链接：https://blog.csdn.net/in_han/article/details/11267047

版权

编程同时被 3 个专栏收录

25 篇文章 0 订阅

订阅专栏

算法

24 篇文章 1 订阅

订阅专栏

数据结构与算法

15 篇文章 0 订阅

订阅专栏

串的模式匹配算法:

Index( S, T, pos )

1. 常规算法：顺序比较，遇到不匹配的情况则回退。

python 代码：(注意回退时回退步长)

#!/usr/bin/python
# -*- coding: utf-8 -*-
# 算法：KMP
# 传统算法, 返回template在 source中第pos个字符之后的位置
def Index( source, template, pos):
    slen = len( source )
    tlen = len( template )
    i = pos
    j = 0
    while i<slen and j < tlen:
        if source[i] == template[j]:
            i += 1
            j += 1
        else:
            i = i - j + 1
            j = 0
    if j== tlen:
        return i - tlen
    return -1

if __name__ == '__main__':
    source = 'abababcd'
    template = 'abc'
    print( source )
    print( template )
    print( Index(source, template, 0) )

运行结果：

2. KMP算法

当每一趟匹配过程中出现字符比较不等时，不需要回溯 i 指针，而是利用已经得到的 “部分匹配” 的结果将模式向右 “滑动“ 尽可能远的一段距离后再进行比较。

next[ j ] =

1. j = 0 时，即 j 是第一个字符: next[ j ] = -1, 即模式串的第0个串就与主串匹配失效。

2. next[ j ] = max{ k | 0<k<j 且 t[0], t[1], ..., t[k-1] = t[j-k],t[j-k+1], t[j-1] }, 当此集合不为空时。

3. next[ j ] = 0, 其它情况

匹配算法：

当主串与模式串匹配即 S[ i ] = T[ j ] 时, i ++, j++, 指针各自增1.

当主串与模式串不匹配即 S[i] != T[j] 时，j 退到 next[ j ] 位置，再次比较。

当next[ j ] = -1即退到第一个字符还是不能匹配时，需要从主串的下一个位置重新开始匹配, i ++, j = 0。

next算法：

由定义: next[0] = -1

假设 next[ j ] = k, 即模式串中 T[0], T[1], ...,T[k-1] == T[j-k], T[j-k+1], ..., T[j-1], 此时 next[ j + 1 ] = ???

1) 若 T[k] == T[j] , next[ j + 1 ] = next[ j ] + 1

2) 若 T[k] != T[j] :

寻找一个 tmp = next[ next[ ...[ next[j] ] ..] ], 使 T[ tmp ] = T[j], 则 next[j+1] = tmp + 1

python 算法:

#!/usr/bin/python
# -*- coding: utf-8 -*-
# 算法：KMP
# 传统算法, 返回template在 source中第pos个字符之后的位置
def Index( source, template, pos):
    slen = len( source )
    tlen = len( template )
    i = pos
    j = 0
    while i<slen and j < tlen:
        if source[i] == template[j]:
            i += 1
            j += 1
        else:
            i = i - j + 1
            j = 0
    if j== tlen:
        return i - tlen
    return -1

# 求模式串的next函数
def get_next(template):
    tlen = len( template )
    next = range(0, tlen)
    i = 0
    next[0] = -1
    j = -1
    while i < tlen-1:
        if -1 == j or template[i] == template[j]:
            i += 1
            j += 1
            next[i] = j
        else:            
            j = next[j]
    return next


def index_kmp( S, T, pos ):
    next = get_next( T )
    print( next )
    i = pos
    j = 1
    while i < len(S) and j < len(T):
        if j == -1 or S[i] == T[j]:
            i += 1
            j += 1
        else:
            j = next[j]
    if j == len(T):
        return i - len(T)
    return -1



if __name__ == '__main__orgin':
    source = 'abababcd'
    template = 'abc'
    print( source )
    print( template )
    print( Index(source, template, 0) )
    
if __name__ == '__main__':
    s = 'acabaabaabcacaabc'
    t = 'abaabcac'
    print( s )
    print( t )
    print(  index_kmp(s, t, 0 ) )

运行结果：