Problem:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]
Analysis:
本题的思路是维护一个Map队列,一次遍历字符串中所有序列,然后进行计数并保留在数值,最终将遍历Map中所有元素,找到所有重复出现的序列,代码如下:
Code:
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> out = new ArrayList<String>();
if(s.length() < 10) {
return out;
}
Map<String, Integer> temp = new HashMap<String, Integer>();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < s.length(); i++) {
if(i < 9) {
sb.append(s.charAt(i));
continue;
}
sb.append(s.charAt(i));
if(sb.length() == 11) {
sb.delete(0, 1);
temp.put(sb.toString(), temp.getOrDefault(sb.toString(), 0) + 1);
}else if(sb.length() == 10) {
temp.put(sb.toString(), temp.getOrDefault(sb.toString(), 0) + 1);
}
}
for(Map.Entry<String, Integer> entry : temp.entrySet()) {
//System.out.println(entry.getKey());
if(entry.getValue() > 1) {
out.add(entry.getKey());
}
}
return out;
}
}