All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
主要考察string的hashcode()实现,因为hashmap用string当key会超时
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> list=new ArrayList<String>();
if(s.length()<10)return list;
Map<Integer, Integer> map=new HashMap<Integer, Integer>();
for(int i=0;i<s.length()-9;i++)
{
String t=s.substring(i, i+10);
int hash=encode(t);
if(map.containsKey(hash)==false)map.put(hash,1);
else
{
map.put(hash,map.get(hash)+1);
if(list.contains(t)==false)list.add(t);
}
}
return list;
}
int encode(String s)
{
if(s==null || s.length()==0)return 0;
int hash=0;
for(int i=0;i<s.length();i++)
{
int curr=0;
switch(s.charAt(i))
{
case 'A':
curr=0;
break;
case 'C':
curr=1;
break;
case 'G':
curr=2;
break;
case 'T':
curr=3;
break;
default:
break;
}
hash+=curr;
hash*=4;
}
return hash;
}
}