KMP（Knuth-Morris-Pratt）算法在字符串匹配中的应用,简单详细（附源码）

最新推荐文章于 2024-08-28 21:25:50 发布

LI WB

最新推荐文章于 2024-08-28 21:25:50 发布

阅读量723

点赞数 10

分类专栏：算法文章标签：算法 c++

本文链接：https://blog.csdn.net/m0_71331863/article/details/134415714

版权

算法专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1. 介绍

欢迎阅读本篇CSDN博客，今天我们将深入探讨KMP（Knuth-Morris-Pratt）算法在字符串匹配中的应用。KMP算法是一种高效的字符串匹配算法，通过预处理模式串，能够在匹配的过程中跳过一些不必要的比较，从而提高匹配的效率。在本文中，我们将通过一个简单的C++代码来实现KMP算法，并逐步解析其中的关键部分。
本篇面向的是初学者，旨在帮助他们了解KMP算法的实现。
提供详细的过程的同时附源码，也是为了让自己的进步，共勉之

1.1 文章目的

本文的目标是介绍KMP算法的基本原理，并通过一个简单的C++实现代码来帮助读者理解该算法的实际应用。我们将会深入解析代码中的Next函数，该函数负责构建KMP算法中的关键部分——next数组。

2. KMP算法概述

在字符串匹配领域，KMP算法是一种高效的字符串匹配算法，它通过预处理模式串（即要匹配的子串），构建一个部分匹配表（也称为next数组），以便在匹配过程中跳过一些不必要的比较。这种优化使得KMP算法相较于朴素的字符串匹配算法更加高效。

2.1 KMP算法原理

KMP算法的核心思想是，当在母串中发现不匹配的字符时，利用已匹配的部分信息，尽量减少比较的次数，直接跳到可能发生匹配的位置。这一点通过next数组的构建来实现。

2.2 为什么使用KMP算法

相对于朴素的匹配算法，KMP算法的主要优势在于其对比较次数的优化。在朴素算法中，每次不匹配都需要回退重新开始比较，而KMP算法通过提前计算好的next数组，避免了这种不必要的回退，从而提高了匹配的效率。

通过对比朴素算法和KMP算法的匹配过程，可以清晰地看到KMP算法如何通过利用已匹配的信息来避免不必要的比较，从而更加高效。

在下一部分，我们将深入解析代码中的关键函数，特别是Next函数，它负责构建这个关键的next数组。

3. 代码解析

3.1 `Next` 函数

在KMP算法中，Next函数的作用非常关键，它负责构建next数组，为匹配过程提供重要的信息。让我们逐步解析这个函数的实现：

void Next(string sub_string, int next[], int sub_str_len) {
    next[0] = -1;
    int currentIndex = 0, subStringIndex = -1;
    while (currentIndex < sub_str_len) {
        if (subStringIndex == -1 || sub_string[subStringIndex] == sub_string[currentIndex]) {
            subStringIndex++;
            currentIndex++;
            next[currentIndex] = subStringIndex;
        } else {
            subStringIndex = next[subStringIndex];
        }
    }
    int *P = next;
    cout << "打印next数组：" << endl;
    while (P != &next[sub_str_len]) {
        cout << *P << endl;
        ++P;
    }
}

3.1.1 初始化和基本思路

next[0] = -1：将next数组的第一个元素初始化为-1。
currentIndex和subStringIndex：分别表示在母串和子串中当前比较的位置。

3.1.2 构建 `next` 数组

while (currentIndex < sub_str_len)：这是一个主要的循环，负责遍历整个子串。
if (subStringIndex == -1 || sub_string[subStringIndex] == sub_string[currentIndex])：
- 如果subStringIndex为-1，或者当前比较的字符相匹配，执行以下操作：
  - subStringIndex++：移动子串指针。
  - currentIndex++：移动当前指针。
  - next[currentIndex] = subStringIndex：将next数组的当前位置记录为已匹配的长度。
else：
- 如果当前字符不匹配，将子串指针回退到next[subStringIndex]的位置，继续尝试匹配。
最终，通过循环打印next数组。

3.2 主函数中的 KMP 匹配

在主函数中，我们利用构建好的next数组，实现了KMP算法的匹配过程。具体实现如下：

int indexForSub = 0, indexForMain = 0;
while (indexForMain < main_str_len && indexForSub < sub_str_len) {
    if (indexForSub == -1 || main_string[indexForMain] == sub_string[indexForSub]) {
        indexForSub++;
        indexForMain++;
    } else {
        indexForSub = next[indexForSub];
    }
}
cout << "起始位置：";
if (indexForSub == sub_str_len) {
    cout << indexForMain - indexForSub;
} else {
    cout << "wu";
}

3.2.1 匹配过程

while (indexForMain < main_str_len && indexForSub < sub_str_len)：
- 主循环，负责在母串和子串之间进行匹配。
if (indexForSub == -1 || main_string[indexForMain] == sub_string[indexForSub])：
- 如果子串指针为-1，或者当前比较的字符匹配，继续向前移动。
else：
- 如果字符不匹配，利用next数组回退子串指针。

3.2.2 输出结果

输出匹配的起始位置，如果匹配成功，输出起始位置，否则输出 “wu”。

通过这两个关键步骤，我们完成了KMP算法的实现，利用next数组的构建和匹配过程，提高了字符串匹配的效率。

在下一部分，我们将通过示例运行来验证这段代码的实际效果。

（接下来是第四部分，示例运行）

完整代码：

#include <stdio.h>
#include <string>
#include <iostream>
using namespace std;

void Next(string sub_string, int next[], int sub_str_len) {
    next[0] = -1;
    int currentIndex = 0, subStringIndex = -1;
    while (currentIndex < sub_str_len) {
        if (subStringIndex == -1 || sub_string[subStringIndex] == sub_string[currentIndex]) {
            subStringIndex++;
            currentIndex++;
            next[currentIndex] = subStringIndex;
        } else {
            subStringIndex = next[subStringIndex];
        }
    }
    int *P = next;
    cout << "打印next数组：" << endl;
    while (P != &next[sub_str_len]) {
        cout << *P << endl;
        ++P;
    }
}

int main() {
    string main_string, sub_string;
    int next[100];
    cout << "输入母串" << endl;
    cin >> main_string;
    cout << "输入子串" << endl;
    cin >> sub_string;
    int sub_str_len = sub_string.size();
    int main_str_len = main_string.size();
    Next(sub_string, next, sub_str_len);
    int indexForSub = 0, indexForMain = 0;
    while (indexForMain < main_str_len && indexForSub < sub_str_len) {
        if (indexForSub == -1 || main_string[indexForMain] == sub_string[indexForSub]) {
            indexForSub++;
            indexForMain++;
        } else {
            indexForSub = next[indexForSub];
        }
    }
    cout << "起始位置：";
    if (indexForSub == sub_str_len) {
        cout << indexForMain - indexForSub;
    } else {
        cout << "wu";
    }

    return 0;
}

KMP优化：

			0	1	2	3	4	
			a	a	a	a	a
next:      -1	0	1	2	3

其实我们发现当[2]和主串不匹配时，即使跳到了[1]因为他俩相等，[1]也不会和主串匹配。所以我们进行了优化

当sub_string[currentIndex] == sub_string[subStringIndex]为真时，说明当前位置的字符与之前某个位置的字符相等。就把next[1]赋给next[2]

优化思路是，如果我们已经知道在之前的位置上已经发生了匹配，而且在这两个位置上的字符是相等的，那么在当前位置的比较中是不可能发生不匹配的。因此，我们可以直接将next[currentIndex]设置为next[subStringIndex]，跳过这些不必要的比较步骤。

完整代码：

#include <iostream>
#include <stdio.h>
#include <string>

using namespace std;

void computeNextArray(string sub_string, int next[], int sub_str_len) {
    int currentIndex = 0, subStringIndex = -1;
    next[0] = -1;
    while (currentIndex < sub_str_len) {
        if (subStringIndex == -1 || sub_string[currentIndex] == sub_string[subStringIndex]) {
            subStringIndex++;
            currentIndex++;
            if (sub_string[currentIndex] != sub_string[subStringIndex]) {
                next[currentIndex] = subStringIndex;
            } else {
                next[currentIndex] = next[subStringIndex];
            }
        } else {
            subStringIndex = next[subStringIndex];
        }
    }
}

int main() {
    string main_string, sub_string;
    cout << "输入主串：" << endl; 
    cin >> main_string;
    cout << "输入子串：" << endl; 
    cin >> sub_string;
    int next[100];
    int sub_str_len = sub_string.size();
    int main_str_len = main_string.size();
    computeNextArray(sub_string, next, sub_str_len);

    int *P = next;
    cout << "打印next数组：" << endl;
    while (P != &next[sub_str_len]) {
        cout << *P << endl;
        ++P;
    }

    int main_index = 0, sub_index = 0;
    while (main_index < main_str_len && sub_index < sub_str_len) {
        if (main_string[main_index] == sub_string[sub_index] || sub_index == -1) {
            main_index++;
            sub_index++;
        } else {
            sub_index = next[sub_index];
        }
    }

    if (sub_index == sub_str_len) {
        cout << "起始位置：" << main_index - sub_index;
    } else {
        cout << "空";
    }
    return 0;
}

LI WB

关注

10
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
KMP（Knuth-Morris-Pratt）算法在字符串匹配中的应用,简单详细（附源码）

欢迎阅读本篇CSDN博客，今天我们将深入探讨KMP（Knuth-Morris-Pratt）算法在字符串匹配中的应用。KMP算法是一种高效的字符串匹配算法，通过预处理模式串，能够在匹配的过程中跳过一些不必要的比较，从而提高匹配的效率。在本文中，我们将通过一个简单的C++代码来实现KMP算法，并逐步解析其中的关键部分。本篇面向的是初学者，旨在帮助他们了解KMP算法的实现。提供详细的过程的同时附源码，也是为了让自己的进步，共勉之。
复制链接

扫一扫