All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
解题思路一:
直接用HashMap实现,JAVA实现如下:
static public List<String> findRepeatedDnaSequences(String s) {
List<String> list=new ArrayList<String>();
HashMap<String,Integer> hm=new HashMap<String,Integer>();
for(int i=0;i<=s.length()-10;i++){
if(hm.containsKey(s.substring(i,i+10)))
list.add(s.substring(i,i+10));
else hm.put(s.substring(i,i+10), 1);
}
return list;
}
结果Memory Limit Exceeded
解题思路二:
模拟Hash,将A、C、G、T分别变为0、1、2、3,然后每10位计算下hashcode,如果hashcode所在的count为1则输出,JAVA实现如下:
static int getValue(char ch) {
if (ch == 'A')
return 0;
else if (ch == 'C')
return 1;
else if (ch == 'G')
return 2;
else
return 3;
}
static public List<String> findRepeatedDnaSequences(String s) {
List<String> list = new ArrayList<String>();
if (s.length() <= 10)
return list;
int[] count = new int[(1 << 20)-1];
int hash = 0;
for (int i = 0; i < 9; i++)
hash = (hash << 2) | getValue(s.charAt(i));
for (int i = 9; i < s.length(); i++) {
hash = (1<<20)-1&((hash << 2) | getValue(s.charAt(i)));
if (count[hash]==1)
list.add(s.substring(i - 9, i + 1));
count[hash]++;
}
return list;
}