187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Subscribe to see which companies asked this question

每个基因可以用两位表示

A:00 ->1

B:01 ->1

G::10 ->2

T:11 ->3

10个字符可以2^20种表达形式2^20<2^32,所以可以用int来存放。

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> res;
        int len=s.size();
        if(len<10) return res;
        map<int,int> m;
        for(int i=0;i<=len-10;i++){
            string sub=s.substr(i,10);
            int code=encode(sub);
            if(m.count(code)){
                if(m[code]==1) res.push_back(sub);
                m[code]++;
            }else{
                m[code]++;
            }
        }
        return res;
    }
private:
    int encode(string sub){
        int code=0;
        for(int i=0;i<sub.size();i++){
            code<<=2;
            switch(sub[i]){
                case 'A':code+=1;break;
                case 'C':code+=2;break;
                case 'G':code+=3;break;
                case 'T':code+=4;break;
            }
        }
        return code;
    }
};


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值