All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].10个字符为单位遍历s,存到hashset,如果有重复就加入到结果里,另外也要防止结果的重复,比如12个A的情况“AAAAAAAAAAAA”。代码如下:
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> res = new ArrayList<String>();
HashSet<String> hs = new HashSet<String>();
int len = s.length();
for (int i = 0; i <= len - 10; i ++) {
String tempStr = s.substring(i, i + 10);
if (!hs.add(tempStr)) {
if (!res.contains(tempStr))
res.add(tempStr);
}
}
return res;
}
}