题目描述
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”
Output: [“AAAAACCCCC”, “CCCCCAAAAA”]
AC代码1
public class _187FindRepeatedDnaSequences {
public List<String> findRepeatedDnaSequences(String s) {
List<String> resList = new ArrayList<>();
Set<Integer> words = new HashSet<>();
Set<Integer> doublewords = new HashSet<>(); // 防止添加到resList中的字符串重复
char[] maps = new char[26];
//maps['A' - 'A'] = 0; 默认操作,无需显式
maps['C' - 'A'] = 1;
maps['G' - 'A'] = 2;
maps['T' - 'A'] = 3;
for(int i = 0;i + 9 < s.length();i++){ // 注意是 i + 9
int val = 0;
for(int j = 0;j < 10;j++){
val <<= 2;
val |= maps[s.charAt(j + i) - 'A'];
}
if(!words.add(val) && doublewords.add(val))
resList.add(s.substring(i,i + 10));
}
return resList;
}
}
AC代码2
public List<String> findRepeatedDnaSequences(String s) {
Set seen = new HashSet(), repeated = new HashSet();
for (int i = 0; i + 9 < s.length(); i++) {
String ten = s.substring(i, i + 10);
if (!seen.add(ten))
repeated.add(ten);
}
return new ArrayList(repeated);
}
注:题目及解法均来自于LeetCode