Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Subscribe to see which companies asked this question.

解题技巧:

       该题主要采用了两种技巧:位运算、hash。

       考虑将ACGT进行二进制编码,即:A -> 00, C -> 01, G -> 10, T -> 11;在编码的情况下,每10位字符串的组合即为一个数字,且10位的字符串有20位,一般来说int有4个字节,32位,即可以用于对应一个10位的字符串。例如:ACGTACGTAC -> 00011011000110110001

       20位的二进制数,至多有2^20种组合,因此hash table的大小为2^20,即1024 * 1024,将hash table设计为bool hashTable[1024 * 1024];

       在处理字符串时,每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0;得到当前的子字符串对应的值val后,判断该值是否出现过,如果未出现,则将hasTable[val]设置为true,否则,将当前的子字符串存入到set容器中

代码:

#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <mem.h>
#include <map>
using namespace std;

vector<string> findRepeatedDnaSequences(string s)
{
    vector<string> res;
    if(s.length() < 10) return res;

    map<char,int> mp;
    mp['A'] = 0;
    mp['C'] = 1;
    mp['G'] = 2;
    mp['T'] = 3;

    bool exist[1024*1024];
    memset(exist, false, sizeof(exist));

    int val = 0;
    for(int i = 0; i < 10; i ++)
    {
        val <<= 2;
        val |= mp[s[i]];
    }
    exist[val] = true;

    set<string> tmp;
    for(int i = 10; i < s.length(); i ++)
    {
        val <<= 2;
        val |= mp[s[i]];
        val &= ~(0x300000);
        if(exist[val]) tmp.insert(s.substr(i-9,10));
        else exist[val] = true;
    }
    set<string>::iterator it = tmp.begin();
    while(it != tmp.end())
    {
        res.push_back(*it);
        it++;
    }
    return res;
}

int main()
{
    vector<string> res;
    string s;
    cin >> s;
    res = findRepeatedDnaSequences(s);
    for(int i = 0; i < res.size(); i ++)
    {
        cout<<res[i]<<' ';
    }
}


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值