算法刷题打卡第35天：找出字符串中第一个匹配项的下标【BF/RK/BM】-CSDN博客

本文链接：https://blog.csdn.net/weixin_45616285/article/details/128178984

找出字符串中第一个匹配项的下标

难度：中等
给你两个字符串 haystack 和 needle ，请你在 haystack 字符串中找出 needle 字符串的第一个匹配项的下标（下标从 0 开始）。如果 needle 不是 haystack 的一部分，则返回 -1 。

示例 1：

输入：haystack = "sadbutsad", needle = "sad"
输出：0
解释："sad" 在下标 0 和 6 处匹配。
第一个匹配项的下标是 0 ，所以返回 0 。

示例 2：

输入：haystack = "leetcode", needle = "leeto"
输出：-1
解释："leeto" 没有在 "leetcode" 中出现，所以返回 -1 。

解法一、BF算法

思路：
逐字符地进行匹配(比较 $A [i]$ 和 $B [j]$ )，如果当前字符匹配成功( $A [i] = = B [j]$ )，就匹配下一个字符( $+ + i ， + + j$ )，如果失配， $i$ 回溯， $j$ 置为 $0$ ( $i = i - j + 1 ， j = 0$ )。

时间复杂度： $O (m * n)$ ，与主串和模式串的长度都正相关
空间复杂度： $O (1)$

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        needle_length = len(needle)
        for i in range(len(haystack) - needle_length + 1):
            for x, y in zip(haystack[i:i+needle_length], needle):
                if x != y:
                    break 
            else:
                return i
        return -1

解法二、RK算法

思路：
对于给定文本串 $h a y s t a c k$ 与模式串 $n e e d l e$ ，通过滚动哈希算快速筛选出与模式串 $n e e d l e$ 不匹配的文本位置，然后在其余位置继续检查匹配项。此处采用 $o r d$ 函数代替hash，数字较小便于计算。

时间复杂度： $O (n)$ 。其中文本串 $h a y s t a c k$ 的长度为 $n$ ，模式串 $n e e d l e$ 的长度为 $m$ 。
空间复杂度： $O (1)$ 。

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        needle_length = len(needle)
        needle_sum = sum(ord(i) for i in needle)
        haystack_sum = sum(ord(i) for i in haystack[:needle_length])
        for i in range(len(haystack) - needle_length + 1):
            if i != 0:
                haystack_sum = haystack_sum - ord(haystack[i-1]) + ord(haystack[i+needle_length-1])
            if needle_sum == haystack_sum:
                for x, y in zip(haystack[i:i+needle_length], needle):
                    if x != y:
                        break 
                else:
                    return i
        return -1

解法三、BM算法：没有采用内存优化重复计算

思路：
记 $h a y s t a c k$ 为 $T$ ， $n e e d l e p$ 为 $p$ ，对于给定文本串 $T$ 与模式串 $p$ ，先对模式串 $p$ 进行预处理。然后在匹配的过程中，当发现文本串 $T$ 的某个字符与模式串 $p$ 不匹配的时候，根据启发策略，能够直接尽可能地跳过一些无法匹配的情况，将模式串多向后滑动几位。

$B M$ 算法具体步骤如下：

计算出文本串 $T$ 的长度为 $n$ ，模式串 $p$ 的长度为 $m$ 。
设置左指针为 $l e f t$ ，两个文本串长度差为 $d i f f e r$ ， $d i f f e r$ 同时为 $l e f t$ 最大移动步长，如果 $l e f t < = d i f f e r$ ，则进入循环体，采用 $B M$ 算法更新 $l e f t$ ，步骤如下：
- 如果文本串对应位置 $T [i + j]$ 上的字符与 $p [j]$ 相同，则继续比较前一位字符。
  1. 如果模式串全部匹配完毕，则返回 $T r u e$ 。
- 如果文本串对应位置 $T [i + j]$ 上的字符与 $p [j]$ 不相同，则：
  1. 根据坏字符位置表计算出在「坏字符规则」下的移动距离 $bad\_characters\_move$ 。
  2. 根据好后缀规则后移位数表计算出在「好后缀规则」下的移动距离 $good\_suffix\_move$ 。
  3. 返回两种移动距离的最大值，即 $max(bad\_characters\_move， good\_suffix\_move)$ 。
如果移动到末尾也没有找到匹配情况，则返回 -1。如果匹配到了，则返回 $l e f t$ 。

时间复杂度： $O (n / m)$ ，最坏 $O (m * n)$
空间复杂度： $O (s u f f i x)$ ， $s u f f i x$ 为后缀长度，小于模式串 $p$ 。

class Solution:
    def bm(self, str1, str2, str2_length):
        for i in range(str2_length-1, -1, -1):
            if str1[i] != str2[i]:
                # 坏字符
                bad_char_skewing = i + 1
                for j in range(i-1, -1, -1):
                    if str2[j] == str1[i]:
                        bad_char_skewing = i - j
                        break
                # 好后缀
                suffix = str1[i+1:]
                suffix_length = len(suffix)
                if suffix:
                    good_suffix_skewing = 0
                    for j in range(str2_length-1, -suffix_length, -1):
                        if str2[j-suffix_length:j] == suffix:
                            bad_char_skewing = str2_length - j
                            break
                    return max(bad_char_skewing, good_suffix_skewing)
                return bad_char_skewing
        return True

    def strStr(self, haystack: str, needle: str) -> int:
        haystack_length = len(haystack)
        needle_length = len(needle)
        difference_length = haystack_length - needle_length
        left = 0
        while left <= difference_length:
            skewing = self.bm(haystack[left: left+needle_length], needle, needle_length)
            if skewing is True:
                return left
            left += skewing
        return -1