All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Hide Company Tags LinkedIn
Hide Tags Hash Table Bit Manipulation
如果不考虑用bit manipulation的方法, 这道题就一行code:
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<string,int> mp;
int n = s.size();
vector<string> res;
for(int i = 0; i<n-9; ++i){
if(mp[s.substr(i,10)]++ == 1) res.push_back(s.substr(i,10));
}
return res;
}
};
但是很慢。。看到discuss里的8ms code, bit manipulation 这种题总是terrify me! 讨厌! 好好理解一下:
vector<string> findRepeatedDnaSequences(string s) {
char hashMap[1048576] = {0};
vector<string> ans;
int len = s.size(),hashNum = 0;
if (len < 11) return ans;
for (int i = 0;i < 9;++i)
hashNum = hashNum << 2 | (s[i] - 'A' + 1) % 5;
for (int i = 9;i < len;++i)
if (hashMap[hashNum = (hashNum << 2 | (s[i] - 'A' + 1) % 5) & 0xfffff]++ == 1)
ans.push_back(s.substr(i-9,10));
return ans;
}