All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
不会写,参考网上的做法,基本都是将字符串转换成数字保存,然后放入哈希表中进行判断。
Source1 (MTL了)
public List<String> findRepeatedDnaSequences(String s) {
List<String> res = new ArrayList<String>();
if(s.length() <= 10) return res;
int[] a = new int['T' + 1]; //数组开到ASCII中'T'+1的位置
char[] b = {'A', 'C', 'G', 'T'};
a['A'] = 0;
a['C'] = 1;
a['G'] = 2;
a['T'] = 3;
HashMap<Long, Integer> hm = new HashMap<Long, Integer>(); //Long不是long
for(int i = 0; i < s.length() - 9; i++){
long sum = 0;
for(int j = i + 9; j >= i; j--){
sum += a[s.charAt(j)] * Math.pow(10, i + 9 - j);
}
if(!hm.containsKey(sum)){
hm.put(sum, 1);
}
else{
if(hm.get(sum) == 1){
String temp = new String();
for(int j = 9; j >= 0; j--){
int k = (int)(sum % 10);
char c = b[k];
sum /= 10;
temp += c;
}
res.add(temp);
}
else hm.put(sum, hm.get(sum) + 1);
}
}
return res;
}
Test
public static void main(String[] args){
String s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT";
System.out.println(new Solution().findRepeatedDnaSequences(s));
}
Source2
public List<String> findRepeatedDnaSequences(String s) {
HashSet<Integer> a = new HashSet<>();
HashSet<Integer> b = new HashSet<>();
List<String> res = new ArrayList<>();
char[] map = new char[26];
map['C' - 'A'] = 1;
map['G' - 'A'] = 2;
map['T' - 'A'] = 3;
for(int i = 0; i < s.length() - 9; i++){
int sum = 0;
for(int j = i; j < i + 10; j++){
sum <<= 2; //因为map中有2,3都是两位,所以一次sum运算要移两位
sum |= map[s.charAt(j) - 'A'];
}
if(!a.add(sum) && b.add(sum)){ //***非常巧妙,!a.add(sum)保证多于一次的返回true,即出现两次及以上时返回true,b.add(sum)保证只有第二次加入res,不重复加入
//hashset是不允许重复的,如果重复的话,add方法会返回false
res.add(s.substring(i, i + 10));
}
}
return res;
}