字符串匹配算法-KMP整理(英文)

最新推荐文章于 2022-04-17 23:29:28 发布

猴子数据分析

最新推荐文章于 2022-04-17 23:29:28 发布

阅读量2.4k

点赞数

分类专栏：数据结构和算法

本文链接：https://blog.csdn.net/yangzhongblog/article/details/9174155

版权

本文详细解释了KMP算法中的关键——部分匹配表。通过举例说明部分匹配表的构造及其含义，帮助理解如何利用部分匹配表在字符串搜索中避免不必要的比较，从而提高效率。KMP算法的时间复杂度为O(m+n)，优于简单的O(m*n)匹配算法。

摘要由CSDN通过智能技术生成

原文英文地址

中文地址

原文：

在长为n的字符串中匹配长度为m的子串，简单匹配算法的时间复杂度为O(m*n)；而KMP算法时间复杂度为O(m+n)。

For the past few days, I’ve been reading various explanations of the Knuth-Morris-Pratt string searching algorithms. For some reason, none of the explanations were doing it for me. I kept banging my head against a brick wall once I started reading “the prefix of the suffix of the prefix of the…”.

Finally, after reading the same paragraph of CLRS over and over for about 30 minutes, I decided to sit down, do a bunch of examples, and diagram them out. I now understand the algorithm, and can explain it. For those who think like me, here it is in my own words. As a side note, I’m not going to explain why it’s more efficient than na”ive string matching; that’s explained perfectly well in a multitude of places. I’m going to explain exactly how it works, as my brain understands it.

The Partial Match Table

The key to KMP, of course, is the partial match table. The main obstacle between me and understanding KMP was the fact that I didn’t quite fully grasp what the values in the partial match table really meant. I will now try to explain them in the simplest words possible.

Here’s the partial match table for the pattern “abababca”:

 
     char:  | a | b | a | b | a | b | c | a |
index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 
value: | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |

If I have an eight-character pattern (let’s say “abababca” for the duration of this example), my partial match table will have eight cells. If I’m looking at the eighth and last cell in the table, I’m interested in the entire pattern (“abababca”). If I’m looking at the seventh cell in the table, I’m only interested in the first seven characters in the pattern (“abababc”); the eighth one (“a”) is irrelevant, and can go fall off a building or something. If I’m looking at the sixth cell of the in the table… you get the idea. Notice that I haven’t talked about what each cell means yet, but just what it’s referring to.