原题
Given a string s and a non-empty string p, find all the start indices of p’s anagrams in s.
Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100.
The order of output does not matter.
Example 1:
Input:
s: "cbaebabacd" p: "abc"
Output:
[0, 6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".
Example 2:
Input:
s: "abab" p: "ab"
Output:
[0, 1, 2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".
Reference Answer
思路分析
这个题考的是时间复杂度。如果判断两个切片是否是排列组合的话,时间复杂度略高,会超时。
能AC的做法是用了一个滑动窗口,每次进入窗口的字符的个数+1,超出滑动窗口的字符个数-1.
这样就一遍就搞定了,而且不用每个切片都算是不是一个排列组合。
Counter
大法好,判断两个字符串是否是排列组合直接统计词频然后==判断即可。
注意如果一个词出现的次数是0,那么需要从Counter中移除,因为Counter({‘a’: 0, ‘b’: 1})
不等于 Counter({‘b’: 1})
。
才明白,原来刷题也是可以掉包的。。不过这道题也确实没什么好办法,用dict 统计p中出现字符次数再对s进行滑窗统计比较依旧不行;即时间复杂度为 O(n^2) 不能通过通过;
如此,只有调用python自带计数方法Counter了。
Code
from collections import Counter
class Solution(object):
def findAnagrams(self, s, p):
"""
:type s: str
:type p: str
:rtype: List[int]
"""
answer = []
m, n = len(s), len(p)
if m < n:
return answer
pCounter = Counter(p)
sCounter = Counter(s[:n-1])
index = 0
for index in range(n - 1, m):
sCounter[s[index]] += 1
if sCounter == pCounter:
answer.append(index - n + 1)
sCounter[s[index - n + 1]] -= 1
if sCounter[s[index - n + 1]] == 0:
del sCounter[s[index - n + 1]]
return answer
Second Version:
或者换一种写法,思路一样,代码如下:
class Solution:
def findAnagrams(self, s, p):
"""
:type s: str
:type p: str
:rtype: List[int]
"""
if not s or len(s) < len(p):
return []
ls, lp = len(s), len(p)
count = lp
cp = collections.Counter(p)
cs = collections.Counter()
ans = []
for i in range(ls):
cs[s[i]] += 1
if i >= lp:
cs[s[i - lp]] -= 1
if cs[s[i-lp]] == 0:
del cs[s[i-lp]]
if cs == cp:
ans.append(i-lp+1)
return ans
C++ version
用两个哈希表,分别记录p的字符个数,和s中前p字符串长度的字符个数,然后比较,如果两者相同,则将0加入结果res中,然后开始遍历s中剩余的字符,每次右边加入一个新的字符,然后去掉左边的一个旧的字符,每次再比较两个哈希表是否相同即可,参见代码如下:
class Solution {
public:
vector<int> findAnagrams(string s, string p) {
if (s.empty()) return {};
vector<int> res, m1(256,0), m2(256,0);
for (int i = 0; i < p.size(); ++i){
++m1[s[i]]; ++m2[p[i]];
}
if (m1 == m2) res.push_back(0);
for (int i = p.size(); i < s.size(); ++i){
++m1[s[i]];
--m1[s[i-p.size()]];
if (m1==m2) res.push_back(i-p.size()+1);
}
return res;
}
};
Note
- C++做题很多情况下还是有优势的,如经常使用的for循环,比python的range循环系列要方便一些(个人体会);
- python是可以借助系统包的,当自身实现功能不能AC时,最好借助系统包。
参考文献
[1] http://www.cnblogs.com/grandyang/p/6014408.html
[2] https://blog.csdn.net/fuxuemingzhu/article/details/79184109
[3] http://bookshadow.com/weblog/2016/10/23/leetcode-find-all-anagrams-in-a-string/