LeetCode Repeated DNA Sequences

原创 2015年07月08日 18:16:19

Description:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution:

首先比较直接的做法,遍历+TreeMap把所有的结果存储起来,存储的是String,结果MLE了(这里不得不吐槽LeetCode,不给数据范围怎么做!?)

所以考虑降低内存的方法。

对于DNA,其实只有AGCT四种情况,完全可以做一个映射,将他们对应到0,1,2,3,也有小伙伴用的是AGCT对于int在二进制表示下的后三位。

所以每2位用一个二进制存储DNA即可,在二进制下的0,1,2,3是00,01,10,11,一共十位,最大是20个1,转换成十六进制,就是0xFFFFF。

import java.util.*;

public class Solution {
	public List<String> findRepeatedDnaSequences(String s) {
		List<String> list = new ArrayList<String>();
		TreeMap<Integer, Integer> map = new TreeMap<Integer, Integer>();

		if (s.length() < 10)
			return list;

		int temp = 0, num;
		for (int i = 0; i < 9; i++) {
			temp = temp << 2 | convert(s.charAt(i));
		}
		for (int i = 9; i < s.length(); i++) {
			temp = (temp << 2 | convert(s.charAt(i))) & 0xFFFFF;
			if (map.containsKey(temp)) {
				num = map.get(temp);
				map.put(temp, num + 1);
			} else
				map.put(temp, 1);
		}

		String neo;
		Iterator<Integer> ite = map.keySet().iterator();
		while (ite.hasNext()) {
			temp = ite.next();
			num = map.get(temp);
			if (num == 1)
				continue;
			neo = "";
			for (int i = 0; i < 10; i++) {
				neo = (char) convert(temp % 4) + neo;
				temp >>= 2;
			}
			list.add(new String(neo));
		}

		return list;
	}

	int convert(int ch) {
		switch (ch) {
		case 'A':
			return 0;
		case 'C':
			return 1;
		case 'G':
			return 2;
		case 'T':
			return 3;
		case 0:
			return 'A';
		case 1:
			return 'C';
		case 2:
			return 'G';
		case 3:
			return 'T';
		}
		return 0;
	}
}


相关文章推荐

LeetCode——Repeated DNA Sequences

题目描述: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "...

LeetCode刷题-187. Repeated DNA Sequences

题目:All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAA...

LeetCode:Repeated DNA Sequences

问题: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG...

[LeetCode] Repeated DNA Sequences

Repeated DNA Sequences   All DNA is composed of a series of nucleotides abbreviated as A, C, G...

算法作业HW15:LeetCode187 Repeated DNA Sequences

Description: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for ex...

[LeetCode]Repeated DNA Sequences Total

题意:题目意思很简单就是有一个由 A C G T 组成的字符串,要求找出字符窜中出现次数不止1次的字串 思路1: 遍历字符串,用hashmap存储字串,判断即可 代码1: public Lis...

Leetcode Repeated DNA Sequences

Leetcode Repeated DNA Sequences

Leetcode:Repeated DNA Sequences

开始看到题目,觉得就是直接从第一个子字符串开始遍历,并存储在List中,如果某个子字符串出现两次,就将其添加到结果列表中。结果TLE(Time Limited Exceeded)了。代码如下: pub...

[Leetcode]Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTC...

LeetCode-187.Repeated DNA Sequences

https://leetcode.com/problems/repeated-dna-sequences/ All DNA is composed of a series of nucleotid...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:LeetCode Repeated DNA Sequences
举报原因:
原因补充:

(最多只允许输入30个字)