关闭

LeetCode Repeated DNA Sequences

标签: javaleetcodeBinary
175人阅读 评论(0) 收藏 举报
分类:

Description:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution:

首先比较直接的做法,遍历+TreeMap把所有的结果存储起来,存储的是String,结果MLE了(这里不得不吐槽LeetCode,不给数据范围怎么做!?)

所以考虑降低内存的方法。

对于DNA,其实只有AGCT四种情况,完全可以做一个映射,将他们对应到0,1,2,3,也有小伙伴用的是AGCT对于int在二进制表示下的后三位。

所以每2位用一个二进制存储DNA即可,在二进制下的0,1,2,3是00,01,10,11,一共十位,最大是20个1,转换成十六进制,就是0xFFFFF。

import java.util.*;

public class Solution {
	public List<String> findRepeatedDnaSequences(String s) {
		List<String> list = new ArrayList<String>();
		TreeMap<Integer, Integer> map = new TreeMap<Integer, Integer>();

		if (s.length() < 10)
			return list;

		int temp = 0, num;
		for (int i = 0; i < 9; i++) {
			temp = temp << 2 | convert(s.charAt(i));
		}
		for (int i = 9; i < s.length(); i++) {
			temp = (temp << 2 | convert(s.charAt(i))) & 0xFFFFF;
			if (map.containsKey(temp)) {
				num = map.get(temp);
				map.put(temp, num + 1);
			} else
				map.put(temp, 1);
		}

		String neo;
		Iterator<Integer> ite = map.keySet().iterator();
		while (ite.hasNext()) {
			temp = ite.next();
			num = map.get(temp);
			if (num == 1)
				continue;
			neo = "";
			for (int i = 0; i < 10; i++) {
				neo = (char) convert(temp % 4) + neo;
				temp >>= 2;
			}
			list.add(new String(neo));
		}

		return list;
	}

	int convert(int ch) {
		switch (ch) {
		case 'A':
			return 0;
		case 'C':
			return 1;
		case 'G':
			return 2;
		case 'T':
			return 3;
		case 0:
			return 'A';
		case 1:
			return 'C';
		case 2:
			return 'G';
		case 3:
			return 'T';
		}
		return 0;
	}
}


0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:67205次
    • 积分:3299
    • 等级:
    • 排名:第10607名
    • 原创:288篇
    • 转载:0篇
    • 译文:0篇
    • 评论:4条
    最新评论