KMP Algorithm‘s summary

俗今见

已于 2022-09-14 17:08:02 修改

阅读量147

点赞数

分类专栏：算法文章标签：算法

于 2019-11-10 17:23:49 首次发布

本文链接：https://blog.csdn.net/weixin_44333471/article/details/102999685

版权

算法专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Overview of KMP Algorithm

KMP is an algorithm for efficient pattern matching between strings. It does this by processing the pattern string so that when matching the pattern to the target string, you don’t have to go back through the variable of the target string. Efficient because it increases the matching efficiency from Brute-Force’s O(m*n) to O(m + n).

First of all, in the brute force solution, when traversing the target string and encountering a character that does not match, traversing the variable i (I is the variable name of traversing the target string) will return to the position of i-j+1 (j is the variable name of traversing the pattern string). O(m*n) ** low efficiency is the result.

Therefore, the KPM algorithm written by three people makes i directly do not backtrack.

Speaking of which, I could not understand it at that time. If it doesn’t go back, how does it go forward?

The way given by KPM is to let j trace back to the appropriate position so that the comparison process can proceed.

One thing to understand before analyzing this problem: no matter how you compare it, before encounters a match that does not match the character, it must have a match that matches the qualified string, or it can be literally said that the pattern string t’s j’th character up to all the previous character.

Use the formula, that is, when s[i]! != t[j], s[i-j to i-1] == t[0 to j-1].

This is easy to understand, like target string s = ‘aabegashga’ pattern string t= ‘aabxye’.

When comparing to the fourth character of the s string (which by default starts at 0), s[3]=‘e’, t[3]=‘x’, it is clear that to this is the case that the ‘match does not match’.

That’s where the problem gets better.

Example target string s= ‘abcabcabxyz’ mode string t=‘abcabx’.

We traveled through the comparison, advanced i = 0 and i = 4 (j = 0 to j = 4) are all conform to, when i = 5, j = 5, s [5] = ‘c’, t [5] = ‘x’. at this point, i does not move, instead of backing to the position of j = 2’s position.

Ok, continue our comparison (i ++, j++) up to t=6 (t[6]=‘\0’, in this case i =9), which should return the successful match signal and the position of t’s position which is i -j.

Why does j go back to the position where j=2 instead of j=0 from the beginning?

This involves some wonderful work that the KPM does before pattern matching.

Let’s look at the pattern string t=‘abcabx’, and see if we can find a pattern of 100 million points? The first two characters of t are the same as the fourth and fifth characters. When the convenience fails to match x, it is not necessary to trace j back to j=0 to start the match again, but to trace j back to j=2 to start the match. And this number 2 ‘coincidentally’ is the same number of characters. (The first two characters are the same as the last two characters).

However, this is not by chance, is based on truth.

Many textbooks (one of them I’ve seen anyway) simply throw you a ‘next’ array, which stores a k value for each character of the pattern string, and where the k value comes from, which in turn gives you a piece of code and a string of formulas.(It was very confusing at the time.)

In fact, I think it doesn’t matter what the name of the next array is, or if it doesn’t match. It is efficient as long as it can get j back to the right place to continue matching if the match fails.

Here is how to get j backtrack to where the method.

Here I want to write two parts, one is the book posted the classical algorithm, to explain it in a popular way, the other is to use their own understanding to write the algorithm.

KPM code section details of classic (textbook)
在这里插入图片描述
Always keep in mind that in this algorithm, the value of next[j] is k, and k is k in the statement that the first k characters are the same as the last k characters.

Furthermore, this code also emphasizes when s[i]! != t[j], j goes back to next[j], that is, not j++.
Picture (help memory)!
在这里插入图片描述
(Actually this is just a review of why j goes back to where j=k)
The key point is how this code will be a next[j] value of the calculation.

(Later, it is important to understand that the next is not a pattern match, so some operations can not be brought into the idea of pattern match.

Value assigned to the next [j] operating only in k = = 1 | | p = = p [j] [k] of)
Step 1: in case of mismatch when j=0, j is already 0, and next j has no room for backtracking, can only be i++. That’s why next[0] is directly initialized to -1
In the code, the concise next[++j] = ++k may be related to the initialization of next[0] -1, in fact, it can be written as next[j+1] = next[j] +1;j++;k++.

Step 2: similarly, when j=1, there is no match, j is already 1, only j=0 can be backdated before, so we can also directly initialize next[1] = 0(if the string has 2 characters or more).
It’s not that the first 0 characters are the same as the last 0 characters.
Step 3: this code pays attention to such a rule (in fact, it is not difficult to find yourself), that is, when t[k] == t[j], next[j+1] = next[j] +1 such a relationship is true.
The value of the next array at position j+1 is the value of the next array at position j+1. The value of the next array at position j+1 is the value of the next array at position j+1.

(Better understood by drawing)!

在这里插入图片描述
Formula proof: because there is t[0 ~ k-1] == p[j-k ~ j-1] before t[j].(next[j] == k)
Then existing t[k] == t [j], if we can get t[0 ~ k – 1] + t[k] == t[j - k ~ j - 1 ] + t [j].
Namely: t[0 ~ k] == t [j-k ~ j], namely next [j + 1] == k + 1 == next[j] + 1.

But there is another case, which is t[k]! Where does k go != t[j]? Does k reset to -1? Or is it 0? We observe that j is unchanged at this time, that is, no operation is performed on j, but k = next[k] directly and a processing is performed on k.

Here comes the essence:
Example: pattern string t=‘a, b, a, b, a, b, a’ (this string is a good example of a special case where k=next[k])
[-1, 0, 0, 1, 1, 2, 1, 2]. Its next array should look like this.
Anyway, you can use the value of the previous part of the next array when you want to find the value of the next array, in order to reduce the number of times you do character alignment, instead of starting from scratch to compare the j’th character with the k’th character (namely, k=0).
The above is my understanding and summarize of KMP algorithm.

俗今见

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
KMP Algorithm‘s summary

Overview of KMP AlgorithmKMP is an algorithm for efficient pattern matching between strings. It does this by processing the pattern string so that when matching the pattern to the target string, you ...
复制链接

扫一扫

专栏目录