[python]leetcode(438). Find All Anagrams in a String

最新推荐文章于 2024-07-29 23:46:41 发布

PKU_Jade

最新推荐文章于 2024-07-29 23:46:41 发布

阅读量1.2k

点赞数

文章标签： python leetcode

本文链接：https://blog.csdn.net/PKU_Jade/article/details/78011463

版权

算法专栏收录该内容

84 篇文章 1 订阅

订阅专栏

problem

Given a string s and a non-empty string p, find all the start indices
of p’s anagrams in s.

Strings consists of lowercase English letters only and the length of
both strings s and p will not be larger than 20,100.

The order of output does not matter.

分析

我的想法是：
把p中的字母存在哈希表中，然后对s中的子串也建立一个哈希表，每次扫描一个字符就加入并计数，
如果扫描结束后没有超出对应的数量，也没有出现不在p中的字符，那么就匹配成功。

如果上一次成功，则只需上一个字符和p长度最后一个字符是否相等，若相等则匹配成功，否则转到下两种情况。

如果出现不在这个p中的字符，则遍历直接跳到这个字符之后。

如果某个字符数量超限，则从头找到等于这个字符的位置跳过，开始下一次匹配。

改进

针对上面的算法有两个改进点：

使用两个索引（begin, end），使用滑动窗口的形式遍历，主要是end可以不动，
使用数组形式的哈希表（可以使用defaultdict(int)），操作更加方便，可以把所有值都赋成零，对不在子串中的字符也-1，这样在使用begin遍历时，可以分辨出它是否在匹配串中。

理解：
end对每个路过的字符-1，begin对每个字符+1，这样begin和end中间的字符信息就记录在字典中了，字典中的值表示当前子串还需要几个对应的字符（负数表示不需要）和p匹配。

同时用count记录当前串是否完成匹配，count主要是记录字典的统计信息的，这样就不用去遍历字典检查信息了。

class Solution(object):
    def findAnagrams(self, s, p):
        from collections import defaultdict
        begin, end = 0, 0
        count = len(p)
        ans = []
        d = defaultdict(int)
        for i in p:
            d[i] += 1

        while end < len(s):


            if d[s[end]] > 0:
                count -= 1
            d[s[end]] -= 1
            end += 1

            #匹配成功
            if count == 0:
                ans.append(begin)

            #字串长度和p相等，begin向前移动
            if end - begin == len(p):
                #begin向前移动
                d[s[begin]] += 1
                begin += 1
                #加1后>=1，说明子串还需要begin对应的字符，即begin抛弃的字符还有用我们需要在后面补上。
                if d[s[begin-1]] >= 1:
                    count += 1

        return ans