LeetCode Repeated DNA Sequences

原创 2015年07月08日 18:16:19

Description:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution:

首先比较直接的做法,遍历+TreeMap把所有的结果存储起来,存储的是String,结果MLE了(这里不得不吐槽LeetCode,不给数据范围怎么做!?)

所以考虑降低内存的方法。

对于DNA,其实只有AGCT四种情况,完全可以做一个映射,将他们对应到0,1,2,3,也有小伙伴用的是AGCT对于int在二进制表示下的后三位。

所以每2位用一个二进制存储DNA即可,在二进制下的0,1,2,3是00,01,10,11,一共十位,最大是20个1,转换成十六进制,就是0xFFFFF。

import java.util.*;

public class Solution {
	public List<String> findRepeatedDnaSequences(String s) {
		List<String> list = new ArrayList<String>();
		TreeMap<Integer, Integer> map = new TreeMap<Integer, Integer>();

		if (s.length() < 10)
			return list;

		int temp = 0, num;
		for (int i = 0; i < 9; i++) {
			temp = temp << 2 | convert(s.charAt(i));
		}
		for (int i = 9; i < s.length(); i++) {
			temp = (temp << 2 | convert(s.charAt(i))) & 0xFFFFF;
			if (map.containsKey(temp)) {
				num = map.get(temp);
				map.put(temp, num + 1);
			} else
				map.put(temp, 1);
		}

		String neo;
		Iterator<Integer> ite = map.keySet().iterator();
		while (ite.hasNext()) {
			temp = ite.next();
			num = map.get(temp);
			if (num == 1)
				continue;
			neo = "";
			for (int i = 0; i < 10; i++) {
				neo = (char) convert(temp % 4) + neo;
				temp >>= 2;
			}
			list.add(new String(neo));
		}

		return list;
	}

	int convert(int ch) {
		switch (ch) {
		case 'A':
			return 0;
		case 'C':
			return 1;
		case 'G':
			return 2;
		case 'T':
			return 3;
		case 0:
			return 'A';
		case 1:
			return 'C';
		case 2:
			return 'G';
		case 3:
			return 'T';
		}
		return 0;
	}
}


【LeetCode】Repeated DNA Sequences 解题报告

【题目】 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "A...
  • ljiabin
  • ljiabin
  • 2015年03月20日 11:19
  • 5609

[leetcode-187]Repeated DNA Sequences(java)

问题描述: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “A...
  • zdavb
  • zdavb
  • 2015年08月28日 22:20
  • 630

leetcode 187: Repeated DNA Sequences

Total Accepted: 1161 Total Submissions: 6887 All DNA is composed of a series of nucleotides abb...
  • xudli
  • xudli
  • 2015年02月09日 07:29
  • 4652

187. Repeated DNA Sequences Leetcode Python

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTC...
  • hyperbolechi
  • hyperbolechi
  • 2015年03月16日 11:21
  • 1097

LeetCode187——Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTC...
  • booirror
  • booirror
  • 2015年08月18日 23:47
  • 1021

leetcode 187. Repeated DNA Sequences 编码计数统计重复字符串 + 移动窗口

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTC...
  • JackZhang_123
  • JackZhang_123
  • 2017年09月20日 09:44
  • 272

Leetcode 187 Repeated DNA Sequences 重复出现的DNA序列

编写一个程序来找到一个DNA分子中出现次数超过一次的长度为10的子序列(子串)。Tags: Hash Table, Bit Manipulation...
  • smile_watermelon
  • smile_watermelon
  • 2015年08月05日 15:54
  • 310

leetcode187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTC...
  • tinkle181129
  • tinkle181129
  • 2016年01月26日 15:12
  • 166

[LeetCode]Repeated DNA Sequences,解题报告

目录目录 前言 题目 Native思路 二进制思路 AC前言最近在LeetCode上能一次AC的概率越来越低了,我这里也是把每次不能一次AC的题目记录下来,把解题思路分享给大家。题目All DNA i...
  • zinss26914
  • zinss26914
  • 2015年03月14日 12:09
  • 3459

leetcode_c++:哈希: Repeated DNA Sequences(187)

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTC...
  • mijian1207mijian
  • mijian1207mijian
  • 2016年07月09日 14:45
  • 195
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:LeetCode Repeated DNA Sequences
举报原因:
原因补充:

(最多只允许输入30个字)