题目:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,
Return:
[“AAAAACCCCC”, “CCCCCAAAAA”].
题解:
这个题题意不难理解,但是刚开始我以为有更多的解,后面才发现第一段连续的c有5个,第二段有6个,所以只存在这两个解。
思路是,将s分成10个一段,然后用map进行存储,将子串作为key,计算出重复次数就可以。
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
int i = 0;
int j = 0;
map<string, int> map;
vector<string> vc;
if(s.length()<=10){
return vc;
}
for (; i <= s.length()-10; i++){
string subs;
subs = s.substr(i, 10);
//cout << subs << endl;
if (map[subs] == NULL){
map[subs] = 1;
}
else{
map[subs]++;
if (map[subs]==2)
vc.push_back(subs);
}
}
return vc;
}
};
虽然ac,但是却用了208ms,比较耗时,然后就查了一下网上的题解,有一个比较有意思的思路是这样的;
1.将四个字符进行编码,A—00;B—01;C—10;D—11;
2.10长度的字符串可以用20位二进制数表示,int类型有4个字节,32位,可以用来表示子串
3.截取子串,可以让字符串左移两位,最高位补新字符的编码;
4.用hashtable 进行存储。
在线性时间就可以解决!666啊
学习到了,以后也要多想想如何省时间。
代码有空贴上
#include <string>
#include <vector>
#include <unordered_set>
#include <cstring>
bool hashMap[1024*1024];
class Solution {
public:
std::vector<std::string> findRepeatedDnaSequences(std::string s);
};
std::vector<std::string> Solution::findRepeatedDnaSequences(std::string s) {
std::vector<std::string> rel;
if (s.length() <= 10) {
return rel;
}
// map char to code
unsigned char convert[26];
convert[0] = 0; // 'A' - 'A' 00
convert[2] = 1; // 'C' - 'A' 01
convert[6] = 2; // 'G' - 'A' 10
convert[19] = 3; // 'T' - 'A' 11
// initial process
// as ten length string
memset(hashMap, false, sizeof(hashMap));
int hashValue = 0;
for (int pos = 0; pos < 10; ++pos) {
hashValue <<= 2;
hashValue |= convert[s[pos] - 'A'];
}
hashMap[hashValue] = true;
std::unordered_set<int> strHashValue;
//
for (int pos = 10; pos < s.length(); ++pos) {
hashValue <<= 2;
hashValue |= convert[s[pos] - 'A'];
hashValue &= ~(0x300000);
if (hashMap[hashValue]) {
if (strHashValue.find(hashValue) == strHashValue.end()) {
rel.push_back(s.substr(pos - 9, 10));
strHashValue.insert(hashValue);
}
} else {
hashMap[hashValue] = true;
}
}
return rel;
}