KMP算法
这里有一个字符串(模式串) a b a a b c,
给他们按顺序加上序号 j:0 1 2 3 4 5
KMP算法中核心是要先求出字符串每个字符对应的 PM[j] 数值,PM[j] 中的 j 代表的是模式串(即你要匹配的字符串)的第 j 个字符的位置。具体来说,next[j] 表示在模式串的前 j 个字符中,最长的相同真前缀和真后缀的长度。
PM数组计算:
a之前没有字符,PM[0] = 0;
a b中前缀为a,后缀为b,PM[1] = 0;
a b a中最长相同前后缀为a,PM[2] = 1;
a b a a中最长相同前后缀为a,PM[3] = 1;
a b a a b中最长相同前后缀为ab,PM[4] = 2;
a b a a b c中没有相同前后缀
计算模式串的next数组:PM = [0 , 0, 1, 1, 2, 0]
主串:abaabaabcabaabc
模式串:abaabc
当主串字符和模式串字符不匹配的时候,j会被更新成PM[j-1],并且在匹配成功的时候,在接下来的匹配,j也会更新成PM[j-1]。
详细步骤
i = 0, j = 0:
主串[0] (a) == 模式串[0] (a),匹配,i = 1, j = 1
i = 1, j = 1:
主串[1] (b) == 模式串[1] (b),匹配,i = 2, j = 2
i = 2, j = 2:
主串[2] (a) == 模式串[2] (a),匹配,i = 3, j = 3
i = 3, j = 3:
主串[3] (a) == 模式串[3] (a),匹配,i = 4, j = 4
i = 4, j = 4:
主串[4] (b) == 模式串[4] (b),匹配,i = 5, j = 5
i = 5, j = 5:
主串[5] (a) != 模式串[5] (c),不匹配,j = PM[5-1] = PM[4] = 2
i = 5, j = 2:
主串[5] (a) == 模式串[2] (a),匹配,i = 6, j = 3
i = 6, j = 3:
主串[6] (a) == 模式串[3] (a),匹配,i = 7, j = 4
i = 7, j = 4:
主串[7] (b) == 模式串[4] (b),匹配,i = 8, j = 5
i = 8, j = 5:
主串[8] (c)== 模式串[5] (c),匹配, i = 9, j = 6,此时,j == m,表示匹配成功,模式串在主串中从i - j = 3的位置开始。并更新j为PM[6-1]=PM[5] = 0,匹配,i = 9, j = 0
继续匹配剩余部分:
继续比较,i = 9, j = 0
i = 9, j = 0:
主串[9] (a) == 模式串[0] (a),匹配,i = 10, j = 1
i = 10, j = 1:
主串[10] (b) == 模式串[1] (b),匹配,i = 11, j = 2
i = 11, j = 2:
主串[11] (a) == 模式串[2] (a),匹配,i = 12, j = 3
i = 12, j = 3:
主串[12] (a) == 模式串[3] (a),匹配,i = 13, j = 4
i = 13, j = 4:
主串[13] (b) == 模式串[4] (b),匹配,i = 14, j = 5
i = 14, j = 5:
主串[14] (c) == 模式串[5] (c)
此时,j == m,表示再次匹配成功,模式串在主串中从i - j = 9的位置开始。