python coding with ChatGPT 打卡第8天| 字符串匹配：KMP算法

Luna_M

已于 2024-01-20 22:08:01 修改

阅读量894

点赞数 21

分类专栏： Python Coding with ChatGPT 文章标签： python 算法开发语言 leetcode

于 2024-01-19 16:08:45 首次发布

本文链接：https://blog.csdn.net/baidu_33000721/article/details/135697645

版权

Python Coding with ChatGPT 专栏收录该内容

25 篇文章 0 订阅

订阅专栏

系列文章
python coding with ChatGPT 打卡第1天| 二分查找、移除元素
 python coding with ChatGPT 打卡第2天| 双指针、滑动窗口、螺旋矩阵
 python coding with ChatGPT 打卡第3天| 移除链表、设计链表、反转链表
 python coding with ChatGPT 打卡第4天| 链表其他操作：两两交换、删除倒数第N个节点链表相交环形链表
 python coding with ChatGPT 打卡第5天| 哈希表：有效字母异位词、两个数组的交集、快乐数、两数之和
 python coding with ChatGPT 打卡第6天| 哈希表：四数相加、赎金信、三数之和、四数之和
 python coding with ChatGPT 打卡第7天| 字符串：反转字符串

相关题目

28. 实现strStr()
给定一个 haystack 字符串和一个 needle 字符串，在 haystack 字符串中找出 needle 字符串出现的第一个位置 (从0开始)。如果不存在，则返回 -1

英文扩展：It’s like looking for a needle in a haystack. 【大海捞针】

文本串: haystack
模式串: needle

当然，在python中内置方法可以很快解决：

使用index

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        try:
            return haystack.index(needle)
        except ValueError:
            return -1

使用find

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        return haystack.find(needle)

如果不试用内置方法，采用暴力方法的时间复杂度为O(m*n)
暴力方法：

def strStr(haystack, needle):
    needle_len = len(needle)
    if needle_len > len(haystack):
        return -1
    for start in range(len(haystack)):
        count = 0
        i = start
        for j in range(len(needle)):
            if i< len(haystack) and haystack[i] == needle[j]:
                count += 1
                i += 1
            else:
                break
        if count == needle_len:
            return i-j-1
    return -1

KMP是一种高效的字符串匹配算法，将时间复杂度由O(m*n)变成O(m+n)

暴力法2：

class Solution(object):
    def strStr(self, haystack, needle):
        """
        :type haystack: str
        :type needle: str
        :rtype: int
        """
        m, n = len(haystack), len(needle)
        for i in range(m):
            if haystack[i:i+n] == needle:
                return i
        return -1

视频讲解

KMP理论篇
 KMP代码篇

KMP算法

KMP这个名字是怎么来的？
因为是由这三位学者发明的：Knuth，Morris和Pratt，所以取了三位学者名字的首字母。所以叫做KMP

KMP是一种高效的字符串匹配算法，它的主要优势在于能够在匹配过程中跳过某些不必要的比较。
它避免了比较已经匹配过的字符，从而提高了匹配的效率。

KMP算法的核心在于一个被称为“部分匹配表”（Partial Match Table）或“最长公共前后缀表”（Longest Common Prefix and Suffix Table）的预处理数组。这个表基于模式串（你想要搜索的子串）计算，用于决定在不匹配时应该如何移动模式串。

构造KMP表

部分匹配表、或最长公共前后缀表，也可以叫做next数组

ChatGPT的思路和代码：

在这里插入图片描述

def kmp_table(pattern):
    # 部分匹配表的长度与模式串相同
    table = [0] * len(pattern)

    # 初始化变量
    # length 表示最长相同前后缀的长度
    # i 用于遍历模式串
    length, i = 0, 1

    while i < len(pattern):
        if pattern[i] == pattern[length]:
            # 找到相同的前后缀
            length += 1
            table[i] = length
            i += 1
        else:
            if length != 0:
                # 回退到之前找到的最长相同前后缀的下一个位置
                length = table[length - 1]
            else:
                # 没有相同的前后缀
                table[i] = 0
                i += 1

    return table

KMP匹配

通过这个部分匹配表，KMP算法可以在字符串匹配时跳过不必要的比较。在匹配过程中，当遇到不匹配的情况时，算法会利用部分匹配表来确定模式串应该向右滑动多远，而不是从头开始匹配。这样可以大大提高匹配的效率。

def strStr(haystack, needle):
    if len(haystack) < len(needle):
        return -1
    next = kmp_table(needle)
    i = 0
    j = 0
    while i < len(haystack):
        if needle[j] == haystack[i]:
            j += 1
            i += 1
            if j == len(needle):
                return i-j
        else:
            if j > 0:
                j -= 1
                j = next[j]
            else:
                i += 1
    return -1

n为文本串长度，m为模式串长度，因为在匹配的过程中，根据前缀表不断调整匹配的位置，可以看出匹配的过程是O(n)，之前还要单独生成next数组，时间复杂度是O(m)。所以整个KMP算法的时间复杂度是O(n+m)的。

暴力的解法显而易见是O(n × m)，所以KMP在字符串匹配中极大地提高了搜索的效率。

拓展

459. 重复的子字符串

视频讲解

KMP还可以这样用

重点分析

在这里插入图片描述

def repeatedSubstring(s):
    next = kmp_table(s)

    if next[-1] != 0 and len(s) % (len(s)-next[-1]) == 0:
        return True
    return False

不使用KMP：

def repeatedSubstringPattern(s):
    ss = s[1:] + s[:-1]
    if s in ss:
        return True
    return False

Luna_M

关注

21
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录