Repeated DNA Sequences - LeetCode
题目:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
分析:
这道题目也是利用了字典的查找复杂度只有O(1)的特点,虽然最后通过了,但是这道题也提示了可以利用位计算的方式解决,自己对于位计算一直是膜拜但不敢用的态度,在最后附上他人的位计算解法,有兴趣的话可以看一看学习一下。
我看到别人说这个直接用字符串存入会导致Memory Limit Exceeded,但是我没有发现这个问题,几次都是通过的,所以没有进行转整数处理,但是这样处理后会省一部分空间,所以还是在此说一下。
代码:
class Solution:
# @param s, a string
# @return a list of strings
def findRepeatedDnaSequences(self, s):
if not s:
return []
res = []
dic = {}
for i in range(len(s)-9):
if s[i:i+10] not in dic:
dic[s[i:i+10]] = 1
else:
res.append(s[i:i+10])
res = list(set(res)) #在res中可能会有重复的元素,所以我用set处理一下
return res
附:
位计算c++代码:
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<int, int> m;
vector<string> r;
int t = 0, i = 0, ss = s.size();
while (i < ss)
if (m[t = (t << 3 | s[i++] & 7) & 0x3FFFFFFF]++ == 1)
r.push_back(s.substr(i - 10, 10));
return r;
}