Leetcode 187. Repeated DNA Sequences | 位存储

https://leetcode.com/problems/repeated-dna-sequences/description/

这题略没意思把。

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        unordered_map<string, int> rec;
        for (int i = 0; i + 10 <= s.size(); i++) {
            rec[s.substr(i, 10)] ++;
        }
        vector <string> ans;
        for (unordered_map<string,int>::iterator itr = rec.begin(); itr != rec.end(); itr++) {
            if (itr->second > 1 ) {
                ans.push_back(itr->first);
            }
        }
        return ans;
    }
};

Discuss的写法是考虑ATCG的ascii码不同, 就可以三个位表示一个字母 这样1个int能存一个字符串

https://leetcode.com/problems/repeated-dna-sequences/discuss/53877/I-did-it-in-10-lines-of-C++

The main idea is to store the substring as int in map to bypass the memory limits.

There are only four possible character A, C, G, and T, but I want to use 3 bits per letter instead of 2.

Why? It’s easier to code.

A is 0x41, C is 0x43, G is 0x47, T is 0x54. Still don’t see it? Let me write it in octal.

A is 0101, C is 0103, G is 0107, T is 0124. The last digit in octal are different for all four letters. That’s all we need!

We can simply use s[i] & 7 to get the last digit which are just the last 3 bits, it’s much easier than lookup table or switch or a bunch of if and else, right?

We don’t really need to generate the substring from the int. While counting the number of occurrences, we can push the substring into result as soon as the count becomes 2, so there won’t be any duplicates in the result.

vector<string> findRepeatedDnaSequences(string s) {
    unordered_map<int, int> m;
    vector<string> r;
    int t = 0, i = 0, ss = s.size();
    while (i < 9)
        t = t << 3 | s[i++] & 7;
    while (i < ss)
        if (m[t = t << 3 & 0x3FFFFFFF | s[i++] & 7]++ == 1)
            r.push_back(s.substr(i - 10, 10));
    return r;
}

BTW, the OJ doesn’t seems to have test cases which the given string length is smaller than 9, so I didn’t check it to make the code simpler.

Any suggestions?

Update:

I realised that I can use s[i] >> 1 & 3 to get 2 bits, but then I won’t be able to remove the first loop as 1337c0d3r suggested.



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值