187 Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

首先能想到一个简单直接的方法,用一个长度为10的窗口,从左到右扫描,放入HashMap,并把计数器增一。最后,把 HashMap中所有计数器大于1的字符串输出来。时间复杂度O(n) , 由于HashMap中存储了所有长度为10的子串,所以空间复杂度O(10n) 。

由于字符串中只存在 A, C, G, T 四种字符,我们可以把每个字符映射为2个bit:

 

A -> 00

C -> 01

G -> 10

T -> 11

每个长度为10的字符串,可以映射为 20 bits, 小于32位,因此可以把这个字符串映射到一个整数。这个方法时间复杂度依旧是O(n) ,但空间复杂度下降到  O(n)

 

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res= new ArrayList<>();
        if (s.length() < LEN) {
            return res;
        }
        
        Map<Character, Integer> charMap = new HashMap<>();
        charMap.put('A', 0);
        charMap.put('C', 1);
        charMap.put('G', 2);
        charMap.put('T', 3);
        
        Map<Integer, Character> intMap = new HashMap<>();
        intMap.put(0, 'A');
        intMap.put(1, 'C');
        intMap.put(2, 'G');
        intMap.put(3, 'T');
        
        HashMap<Integer, Integer> map = new HashMap<>();
        
        for (int i = 0; i < s.length() - LEN + 1; i++) {
            String key = s.substring(i, i + LEN);
            int hashKey = strToInt(key, charMap);
            map.put(hashKey, map.getOrDefault(hashKey, 0) + 1);
        }
        
        for (HashMap.Entry<Integer, Integer> entry: 
            map.entrySet()) {
            if (entry.getValue() > 1) {
                res.add(intToStr(entry.getKey(), intMap));
            }
        }
        return res;
    }
    
    private int strToInt(String s, Map<Character, Integer> charMap) {
        assert s.length() == LEN;
        int x = 0;
        for (int i = 0; i < LEN; i++) {
            char ch = s.charAt(i);
            x = (x << 2) + charMap.get(ch);
        }
        return x;
    }
    
    private String intToStr(int x, Map<Integer, Character> intMap) {
        StringBuilder sb = new StringBuilder();
        while (x > 0) {
            char ch = intMap.get(x & 3);
            sb.append(ch);
            x = x >> 2;
        }
        
        while (sb.length() < LEN) {
            sb.append(intMap.get(0));
        }
        return sb.reverse().toString();
    }
    
    private int LEN = 10;
}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值