原题:
解决方法:
代码:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
解决方法:
我采用了最简单的一种方法,用一个哈希表来记录基因串出现的次数,达到两次即将其加入到结果集中。
还有很多优化的方法,但总体思路都差不多,这里就不一一列举了。
代码:
vector<string> findRepeatedDnaSequences(string s) {
vector<string> res;
map<string,int> m;
int n = s.size();
if (n < 10)
return res;
string seq = s.substr(0, 10);
m[seq]++;
for(int i = 10; i < n; i++){
seq.erase(0, 1);
seq += s[i];
++m[seq];
if (m[seq] == 2)
res.push_back(seq);
}
return res;
}