LeetCode 187. Repeated DNA Sequences

最新推荐文章于 2023-03-18 22:02:57 发布

Spade_

最新推荐文章于 2023-03-18 22:02:57 发布

阅读量1.9w

点赞数 6

分类专栏： C++ 文章标签：算法与数据结构 Bit Manipulation

本文链接：https://blog.csdn.net/Spade_/article/details/79588342

版权

C++ 专栏收录该内容

25 篇文章 1 订阅

订阅专栏

187. Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,

Given s =
“AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,

Return: [“AAAAACCCCC”,”CCCCCAAAAA”].

简而言之就是找出串s中所有10个字母长的存在重复的子串
注意”AAAAAAAAAAA” 11个A Return ：[“AAAAAAAAAA”]

我的代码：

    /*  "AAAAAAAAAAA" 11个A  Return ：["AAAAAAAAAA"]
        一共A、C、G、T四个字母，我们可以分别用00、01、10、11表示
        一个子序列10个字母，为2*10位，可以用int型数来表示
        如：对于AAACGGGTTT => 00 00 00 01 10 10 10 11 11 11 
        一共有s.length() - 9个子序列，循环[0, s.length - 10]
        对于每一个子序列：
        1、如果不存在子序列集合Set中，说明子序列第1次出现，则放入集合
        2、如果已存在子序列集合Set中，却不存在结果数组中，说明序列是第2次出现，但未加入结果数组，则加入结果数组中
        3、如果已存在子序列集合Set中，也存在结果数组中，说明序列是第2+次出现，忽略
        由于vector<T>查找效率为O(n)，我们可以用另一个集合resultSet来存储已经加入结果数组的子序列
    */
    vector<string> findRepeatedDnaSequences(string s) {
        int len = s.size();
        if(len == 0) return vector<string>();

        vector<string> result;
        unordered_set<int> oneSet;
        unordered_set<int> resultSet;
        char map[26]; 
        map['A' - 'A'] = 0; map['C' - 'A'] = 1;
        map['G' - 'A'] = 2; map['T' - 'A'] = 3;

        // total len - 9 numbers of 10-letter-long sequences
        for(int i = 0; i <= len - 10; ++i){
            int seq = 0;
            int seqEnd = i + 10;    // 10-letter-long sequences' end
            // make the 10-letter-long sequences to a int
            for(int j = i; j < seqEnd; ++j){   
                int state = map[s[j] - 'A'];   // A => 00 ; 
                seq <<= 2;      // move two bit to left
                seq |= state;   // add the new 
            }
            if(!oneSet.count(seq) && (!resultSet.count(seq)))
                oneSet.emplace(seq);
            else if(oneSet.count(seq) && (!resultSet.count(seq))){
                resultSet.emplace(seq);
                result.push_back(s.substr(i, 10));
            }
        }

        return result;
    }

Spade_

关注

6
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
LeetCode 187. Repeated DNA Sequences

187. Repeated DNA Sequences All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated seq...
复制链接

扫一扫

专栏目录