Leetcode Repeated DNA sequences

最新推荐文章于 2022-07-28 15:01:27 发布

proudmore

最新推荐文章于 2022-07-28 15:01:27 发布

阅读量120

点赞数

分类专栏： leetcode leetcode-hashmap leetcode-bit manipulation

本文链接：https://blog.csdn.net/proudmore/article/details/45559703

版权

leetcode 同时被 3 个专栏收录

146 篇文章 0 订阅

订阅专栏

leetcode-bit manipulation

6 篇文章 0 订阅

订阅专栏

leetcode-hashmap

3 篇文章 0 订阅

订阅专栏

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,
Return:
[“AAAAACCCCC”, “CCCCCAAAAA”].

因为短的string总共有4的是10次方种，所以KMP是不可能的。
一开始用2个hashset，一个保存出现1次的，一个存超过1次的，总是memory limit exceeed. 改成一个hashmap就过了，猜测是因为自动扩容的原因，第一个先变大，然后remove掉add到第二个，但是第一个不会缩小。

看到tag中有bit manipulation, 猜想memory limit exceed应该是需要编码来节省空间首先考虑将ACGT进行二进制

A -> 00
C -> 01
G -> 10
T -> 11

10位的字符串需要20位编码；一般来说int有4个字节，32位，够用。一个char是2个byte，所以本来需要20byte，现在只需要4个。比如说
ACGTACGTAC -> 00011011000110110001
AAAAAAAAAA -> 00000000000000000000

不过既然过了我也就懒得写了。。。

[code]

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {

        HashMap<String,Integer> map=new HashMap<String, Integer>();
        if(s.length()<=10)return new ArrayList<String>();
        for(int i=0;i<s.length()-9;i++)
        {
            String temp=s.substring(i,i+10);
            if(map.containsKey(temp)==false)map.put(temp,1);
            else map.put(temp,map.get(temp)+1);
        }
        ArrayList<String> r=new ArrayList<String>();
        for(String str: map.keySet())
        {
            if(map.get(str)>1)r.add(str);
        }
        return r;
    }
}

proudmore

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Leetcode Repeated DNA sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.Write a
复制链接

扫一扫

专栏目录