All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]
问题:
将DNA序列看作是只包含['A','C','G','T']4个字符的字符串,给一个DNA字符串,找到所有长度为10的且出现超过一次的子串。
思路:
建立字符串与重复个数的hashmap,再遍历找到。效率不太高,只比37%Java程序快。
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
//建立字符串与字符串出现数量的hashmap,将int>1的子串string返回
Map<String,Integer> string_map = new HashMap<String,Integer>();
List<String> result = new ArrayList<String>();
for(int i=0; i<s.length()-9; i++) //i<s.length()-10是错的?
{
String str = s.substring(i,i+10); //为什么不是String str = s.substring(i,i+9)??
//Java substring中起始索引包括,结束索引不包括。
if(!string_map.containsKey(str))
string_map.put(str,1);
else
{
int count = string_map.get(str);
string_map.put(str, count+1);
}
}
for (String key : string_map.keySet())
{
if(string_map.get(key)>1)
{
result.add(key);
}
}
return result;
}
}
博主学习笔记,转载请注明出处,谢谢~