Leetcode 187. Repeated DNA Sequences 解法

Leetcode 187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G,
and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes
useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings)
that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

普通方法
哈希map存入每个长度为10 的子串的出现次数,最后遍历哈希map,找出出现1次以上的

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string>result;
        if(s.length()<=10)
            return result;
        string sub;
        map<string,int>hash_map;
        for(int i = 0;i<s.length()-10;i++)
        {
            sub = s.substr(i,10);
            
            if(hash_map.find(sub) == hash_map.end())
            {
                    hash_map[sub] = 1;
            }
            else
            {
                hash_map[sub] += 1;
            }
                
        }
        map<string,int>::iterator it;
        for(it = hash_map.begin();it!=hash_map.end();it++)
        {
            if(it->second > 1)
                result.push_back(it->first);
        }
        return result;
    }
};



使用位操作
由于只有4个字母,所以我们可以把A,C,G,T分别看做 00,01,10,11的二进制,然后10个字母就有20个比特位,因此一个10个字母的子串就变成了一个数字。用一个集合来保存出现一次的数值,用另一个集合来保存出现两次以上的数值。

C++版本

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
    set<int> words;
    set<int>twoWords;
    vector<string> rv;
    char map[26] = {0};
    //map['A' - 'A'] = 0;
    map['C' - 'A'] = 1;
    map['G' - 'A'] = 2;
    map['T' - 'A'] = 3;
    if(s.length() <= 10)
        return rv;
    for(int i = 0; i < s.length() - 9; i++) {
        int v = 0;
        for(int j = i; j < i + 10; j++) {
            v <<= 2;
            v |= map[s[j] - 'A'];
        }
        if(words.find(v)!=words.end() && twoWords.find(v)==twoWords.end())
        {
            rv.push_back(s.substr(i,10));
            twoWords.insert(v);
        }    
        else
            words.insert(v);
    }
    return rv;
    }
};

JAVA版本

public List<String> findRepeatedDnaSequences(String s) {
    Set<Integer> words = new HashSet<>();
    Set<Integer> doubleWords = new HashSet<>();
    List<String> rv = new ArrayList<>();
    char[] map = new char[26];
    //map['A' - 'A'] = 0;
    map['C' - 'A'] = 1;
    map['G' - 'A'] = 2;
    map['T' - 'A'] = 3;

    for(int i = 0; i < s.length() - 9; i++) {
        int v = 0;
        for(int j = i; j < i + 10; j++) {
            v <<= 2;
            v |= map[s.charAt(j) - 'A'];
        }
        if(!words.add(v) && doubleWords.add(v)) {
            rv.add(s.substring(i, i + 10));
        }
    }
    return rv;
}

参考:https://leetcode.com/problems/repeated-dna-sequences/discuss/53867/Clean-Java-solution-(hashmap-%2B-bits-manipulation)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值