All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Subscribe to see which companies asked this question
如果直接用字符串比较会超时,转而利用hash算法,代码如下
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Set<String> result = new HashSet<>();
Map<Character, Integer> map = new HashMap<>();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
if(s.length() <= 10){
return new ArrayList<String>();
}
int hash = 0;
Set<Integer> visited = new HashSet<>();
for(int i = 0; i < s.length(); i++){
hash = (hash<<2) + map.get(s.charAt(i));
if(i>=9){
hash &= (1 << 20) - 1;
if(visited.contains(hash)){
result.add(s.substring(i-9, i+1));
}else{
visited.add(hash);
}
}
}
return new ArrayList<String>(result);
}
}