KMP浅谈
What?
-
什么是KMP?
- 解决模式匹配问题的一种算法,如下所示,可以防止在某一个字符匹配不上的时候可以快速找到模式串(下面的短的那个称为模式串,上面这个称谓目标串,一般的匹配是看目标串中是否有模式串)中的最近的匹配字符,而不需要从头开始匹配
-
要实现KMP就是要实现next数组(nextval下次补上),当我们有了next数组之后,如上面这个是next = [-1,0,0,0,0,1,2],我们匹配的流程就是:
-
target = 'BBCABCDABABCDABCDABDE' needle = 'ABCDABD' # 得到next数组 next = buildNext(needle=needle) # 用作模式串的下标 index = 0 for c in target: if c == needle[index]: index += 1 elif c != needle[index]: index = next[index] while index > -1 and c != needle[index]: index = next[index] # 从头开始 if index == -1: index = 0 elif c == needle[index]: index += 1 if index == len(needle): return True
-
大概就是找不到就从next中找对应模式串中的下表继续比较
-
Why?
- why does it work?
- 这老哥讲的不错:https://www.bilibili.com/video/BV1PD4y1o7nd
- 我的的理解是:对于模式串 aabaaf,当f不能匹配时已知aabaa是已经匹配了的,此时就可以知道目标串前两个是aa,那么就可以不用匹配模式串的前两个aa,转而直接匹配第三个b即可
How?
-
how can I get the next array?
-
def buildNext(self,needle:str): length = len(needle) j = 0 k = -1 # 初始化,-1表示该char模式串匹配不了,直接下一个char重新开始匹配 next = [ 0 for n in range(length)] next[0] = -1 while j < length - 1 : # k == -1为了初始化第二个值,0 # needle[j] == needle[k] 这个表示在上一个前后缀相等了的基础上,再比较下一个字符是否相等,相等了代表这个前后缀页相等了,长度比上一个+1这样 if k == -1 or needle[j] == needle[k]: k += 1 j += 1 next[j] = k else: # needle[j] != needle[k]的话就代表需要缩短上一个前后缀的长度相等的情况的k的下一个下表值去匹配 k = next[k] return next
-
def buildNextVal(self,needle:str): length = len(needle) j = 0 k = -1 next = [ 0 for n in range(length)] next[0] = -1 while j < length - 1 : if k == -1 or needle[j] == needle[k]: k += 1 j += 1 # nextval的做法就是在当前的字符和next数组下标对应的字符相等的时候,next的值替换成对应的字符的下标 # 这个好像是需要排除拍某种情况,这个会导致下次匹配必然失败,从而浪费计算资源,不过实际测试发现构造nextval似乎更耗费时间(就leetcode测试而言)噗 if (needle[j] != needle[k]): next[j] = k else : #因为不能出现p[j] = p[ next[j ]],所以当出现时需要继续递归,k = next[k] = next[next[k]] next[j] = next[k]; else: k = next[k] return next
Experiment
-
leetcode 28:https://leetcode-cn.com/problems/implement-strstr/submissions/
-
虽然是道简单题,暴力也可以过,这个就是典型的模式匹配的题目,代码如下,用python写的,别的语言也大差不差
-
class Solution: def strStr(self, haystack: str, needle: str) -> int: lhay = len(haystack) lnee = len(needle) if lnee == 0: return 0 elif lhay < lnee: return -1 # 感觉可以用动态规划?还是那个字符串匹配 # KMP next = self.buildNextVal(needle) j = 0 index = 0 while index < lhay: if needle[j] == haystack[index]: j += 1 else: # 如果没有匹配到 while j > -1 and needle[j] != haystack[index]: j = next[j] if j == -1: j = 0 elif needle[j] == haystack[index]: j += 1 index += 1 if j == lnee: return index - lnee + 1 return -1 def buildNext(self,needle:str): length = len(needle) j = 0 k = -1 next = [ 0 for n in range(length)] next[0] = -1 while j < length - 1 : if k == -1 or needle[j] == needle[k]: k += 1 j += 1 next[j] = k else: k = next[k] return next def buildNextVal(self,needle:str): length = len(needle) j = 0 k = -1 next = [ 0 for n in range(length)] next[0] = -1 while j < length - 1 : if k == -1 or needle[j] == needle[k]: k += 1 j += 1 if (needle[j] != needle[k]): next[j] = k else : #因为不能出现p[j] = p[ next[j ]],所以当出现时需要继续递归,k = next[k] = next[next[k]] next[j] = next[k]; else: k = next[k] return next