题目:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
题意及分析:给出一个字符串,其中只有A,C,G,T四个字母,每10个字母作为一个子字符串,要求找到出现不止一次的子字符串。这道题直接用hastable方法求解,遍历字符串,对每个子字符串做判断,若在hashtable中不存在,就添加进去;若存在,如果出现的次数为1,那么将其添加进结果中,并更新出现次数,否则继续遍历。还有一种方法是将a,c,g,t使用3位bit来保存,然后10个字母,就30bit,这样就可以用一个整数来保存。
代码:
import java.util.ArrayList;
import java.util.Hashtable;
import java.util.List;
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> res = new ArrayList<>();
Hashtable<String,Integer> temp = new Hashtable<>();
for(int i=0;i<s.length()-9;i++){ //将每一个长度为10的子字符串进行遍历,没有就将其放进hashtable里面,有且现在之出现了一次就添加进结果里面。
String subString = s.substring(i,i+10);
if(temp.containsKey(subString)){
int count=temp.get(subString); //如果为1,则添加进结果,否则继续遍历
if(count==1){
temp.remove(subString);
temp.put(subString,2);
res.add(subString);
}
}else{
temp.put(subString,1);
}
}
return res;
}
}