代码随想录算法训练营Day9 | 28. 实现 strStr()、459.重复的子字符串、字符串总结、双指针回顾 | Python | 个人记录向

最新推荐文章于 2024-05-21 15:39:16 发布

修远Python

最新推荐文章于 2024-05-21 15:39:16 发布

阅读量955

点赞数 17

分类专栏：代码随想录算法训练营文章标签： python 算法开发语言

本文链接：https://blog.csdn.net/Xiu_Yuan123/article/details/137629325

版权

代码随想录算法训练营专栏收录该内容

41 篇文章 0 订阅

订阅专栏

本文目录

28. 实现 strStr()
459.重复的子字符串
字符串总结
双指针回顾
以往忽略的知识点小结
个人体会

28. 实现 strStr()

代码随想录：28. 实现 strStr()
Leetcode：28. 实现 strStr()

做题

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        left = 0
        right = 0
        size1 = len(haystack)
        size2 = len(needle)
        flag_next = False # 是否记录第二个匹配点
        flag_match = False # 是否正在匹配
        if size1 < size2:
            return -1
        while left < size1:
            if haystack[left] == needle[right]:
                if right > 0 and haystack[left] == needle[0] and not flag_next:
                    next_start = left # 记录第二个匹配点
                    flag_next = True
                left += 1
                right += 1
                flag_match = True
                if right == size2:
                    return left - size2
            else:
                if flag_match:
                    left -= 1
                    flag_match = False
                if flag_next:
                    left = next_start
                    flag_next = False
                else:
                    left += 1
                right = 0
        return -1

看文章

思路

本题是KMP经典题目，可使用前缀表。
因为是由这三位学者发明的：Knuth，Morris和Pratt，所以取了三位学者名字的首字母，将这个方法叫做KMP。
KMP的经典思想：当出现字符串不匹配时，可以记录一部分之前已经匹配的文本内容，利用这些信息避免从头再去做匹配。
前缀表的任务：当前位置匹配失败，找到之前已经匹配上的位置，再重新匹配，此也意味着在某个字符失配时，前缀表会告诉你下一步匹配中，模式串应该跳到哪个位置。
前缀表：记录下标i之前（包括i）的字符串中，有多大长度的相同前缀后缀。
前缀表与next数组：next数组可以是前缀表，但是很多实现都是把前缀表统一减一（右移一位，初始位置为-1）之后作为next数组。
时间复杂度分析：其中n为文本串长度，m为模式串长度，因为在匹配的过程中，根据前缀表不断调整匹配的位置，可以看出匹配的过程是O(n)，之前还要单独生成next数组，时间复杂度是O(m)。所以整个KMP算法的时间复杂度是O(n+m)的。暴力的解法显而易见是O(n × m)，所以KMP在字符串匹配中极大地提高了搜索的效率。
个人体会：思路能理解，自己其实已经实现了首字母的匹配位置保存，前缀表本质上是实现多个字母的匹配位置保存，加快匹配速度。

构造前缀表

前缀表（减一）：更多人用

class Solution:
    def getNext(self, next, s):
        j = -1
        next[0] = j
        for i in range(1, len(s)):
            while j >= 0 and s[i] != s[j+1]:
                j = next[j]
            if s[i] == s[j+1]:
                j += 1
            next[i] = j
    
    def strStr(self, haystack: str, needle: str) -> int:
        if not needle:
            return 0
        next = [0] * len(needle)
        self.getNext(next, needle)
        j = -1
        for i in range(len(haystack)):
            while j >= 0 and haystack[i] != needle[j+1]:
                j = next[j]
            if haystack[i] == needle[j+1]:
                j += 1
            if j == len(needle) - 1:
                return i - len(needle) + 1
        return -1

前缀表（不减一）：更好理解

class Solution:
    def getNext(self, next: List[int], s: str) -> None:
        j = 0
        next[0] = 0
        for i in range(1, len(s)):
            while j > 0 and s[i] != s[j]:
                j = next[j - 1]  # 回到上一个匹配成功的位置j，再比较
            if s[i] == s[j]:
                j += 1
            next[i] = j
    
    def strStr(self, haystack: str, needle: str) -> int:
        if len(needle) == 0:
            return 0
        next = [0] * len(needle)
        self.getNext(next, needle)
        j = 0
        for i in range(len(haystack)):
            while j > 0 and haystack[i] != needle[j]:
                j = next[j - 1]
            if haystack[i] == needle[j]:
                j += 1
            if j == len(needle):
                return i - len(needle) + 1
        return -1

使用 index：

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        try:
            return haystack.index(needle)
        except ValueError:
            return -1

使用 find：

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        return haystack.find(needle)

459.重复的子字符串

代码随想录：459.重复的子字符串
 Leetcode：459.重复的子字符串

做题

无思路。

看文章

3种方法：暴力求解、移动匹配、KMP算法。

暴力求解

一个for循环获取子串的终止位置，然后判断子串是否能重复构成字符串，又嵌套一个for循环，所以是O(n^2)的时间复杂度。
重点：只需要判断，以第一个字母为开始的子串就可以（因为是重复子串构成字符串），所以一个for循环获取子串的终止位置就行了。而且遍历的时候，不用遍历结束，只需要遍历到中间位置，因为子串结束位置大于中间位置的话，一定不能重复组成字符串。

移动匹配算法

当一个字符串s：abcabc，内部由重复的子串组成，那么这个字符串的结构是：由前后相同的子串组成。
那么，既然前面有相同的子串，后面有相同的子串，用 s + s，这样组成的字符串中，后面的子串做前串，前面的子串做后串，就一定还能组成一个s。
时间复杂度: O(n)
空间复杂度: O(1)
不过这种解法还有一个问题，就是我们最终还是要判断一个字符串（s + s）是否出现过 s 的过程，大家可能直接用contains，find 之类的库函数。却忽略了实现这些函数的时间复杂度（暴力解法是m * n，一般库函数实现为 O(m + n)）。

KMP算法

在由重复子串组成的字符串中，最长相等前后缀不包含的子串就是最小重复子串，这里拿字符串s：abababab 来举例，ab就是最小重复单位。可举例推理。

前缀表（减一）：更多人用

class Solution:
    def repeatedSubstringPattern(self, s: str) -> bool:  
        if len(s) == 0:
            return False
        nxt = [0] * len(s)
        self.getNext(nxt, s)
        if nxt[-1] != -1 and len(s) % (len(s) - (nxt[-1] + 1)) == 0:
            return True
        return False
    
    def getNext(self, nxt, s):
        nxt[0] = -1
        j = -1
        for i in range(1, len(s)):
            while j >= 0 and s[i] != s[j+1]:
                j = nxt[j]
            if s[i] == s[j+1]:
                j += 1
            nxt[i] = j
        return nxt

前缀表（不减一）：更好理解，但先看完28. 实现 strStr()，理解了前缀表的思路后，建议还是看减一的版本

class Solution:
    def repeatedSubstringPattern(self, s: str) -> bool:  
        if len(s) == 0:
            return False
        nxt = [0] * len(s)
        self.getNext(nxt, s)
        if nxt[-1] != 0 and len(s) % (len(s) - nxt[-1]) == 0:
            return True
        return False
    
    def getNext(self, nxt, s):
        nxt[0] = 0
        j = 0
        for i in range(1, len(s)):
            while j > 0 and s[i] != s[j]:
                j = nxt[j - 1]
            if s[i] == s[j]:
                j += 1
            nxt[i] = j
        return nxt

使用 find

class Solution:
    def repeatedSubstringPattern(self, s: str) -> bool:
        n = len(s)
        if n <= 1:
            return False
        ss = s[1:] + s[:-1]  # 去掉头尾，从而在中间找重复子串
        print(ss.find(s))              
        return ss.find(s) != -1

字符串总结

代码随想录：字符串总结

双指针回顾

代码随想录：双指针回顾

以往忽略的知识点小结

KMP算法
Python字符串基础库函数（string为字符串）
- string.find(substring)：查找子字符串在字符串中的位置，返回第一个匹配的索引，如果没有找到则返回-1。index函数与此功能类似，但找不到会抛出ValueError异常，不如使用find函数。

个人体会

完成时间：3h30min。
心得：KMP算法是难点，认真看可以看懂，需多次学习到掌握。

修远Python

关注

17
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
代码随想录算法训练营Day9 | 28. 实现 strStr()、459.重复的子字符串、字符串总结、双指针回顾 | Python | 个人记录向

字符串：28. 实现 strStr()、459.重复的子字符串、字符串总结、双指针回顾。算法：KMP算法/前缀表。
复制链接

扫一扫