leetcode笔记——187重复的DNA序列

最新推荐文章于 2022-12-21 15:36:05 发布

chenxy132

最新推荐文章于 2022-12-21 15:36:05 发布

阅读量173

点赞数

分类专栏： LeetCode笔记

本文链接：https://blog.csdn.net/chenxy132/article/details/88680402

版权

LeetCode笔记专栏收录该内容

132 篇文章 0 订阅

订阅专栏

题目：

所有 DNA 由一系列缩写为 A，C，G 和 T 的核苷酸组成，例如：“ACGAATTCCG”。在研究 DNA 时，识别 DNA 中的重复序列有时会对研究非常有帮助。

编写一个函数来查找 DNA 分子中所有出现超多一次的10个字母长的序列（子串）。

示例:

输入: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

输出: ["AAAAACCCCC", "CCCCCAAAAA"]

思路：网上找的大神的代码1，原文链接如下：https://blog.csdn.net/xudli/article/details/43666725

代码2原文链接：https://blog.csdn.net/dddongdong/article/details/43758603

代码1执行速度比较快，但是有点看不懂。。。。

先看下代码2.基本是思想是创建一个HashMap,然后将所有长度为10的自字符串都保存下来，最后输出出现次数大于1的子字符串。但是这样会出现内存限制的错误。整个字符串的话只有4个字母，可以用一个整数值表示，每个子字符串都对应唯一的一个整数值，可以将这个整数值作为key存放在Hashmap,value使用true或者false来存放。

代码：

（1）代码1

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<String>();
        if(s==null || s.length() < 11) return res;
        int hash = 0;

        Map<Character, Integer> map = new HashMap<Character, Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);

        Set<Integer> set = new HashSet<Integer>();
        Set<Integer> unique = new HashSet<Integer>();

        for(int i=0; i<s.length(); i++) {
            char c = s.charAt(i);
            if(i<9) {
                hash = (hash<<2) + map.get(c);
            } else {
                hash = (hash<<2) + map.get(c);
                hash &= (1<<20) - 1;
                if( set.contains(hash) && !unique.contains(hash)) {
                    res.add(s.substring(i-9, i+1));
                    unique.add(hash);
                } else {
                    set.add(hash);
                }
            }
        }
        return res;
    }
}

（2）代码2

private int myHash(String s){
        int n = 0;
        for(int i = 0; i < s.length(); i++){
                n <<=2;
                char c = s.charAt(i);
                if(c == 'C'){
                    n += 1;
                }else if(c == 'G'){
                    n += 2;
                }else if(c == 'T'){
                    n += 3;
                }
        }
        return n;
    }

    public List<String> findRepeatedDnaSequences(String s) {
        List<String> list = new LinkedList<String>();
        if(s == null || s.length() < 10) return list;

        HashMap<Integer, Boolean> table = new HashMap<Integer, Boolean>();
        int L = 10;
        for(int i = 0; i <= s.length() - L; i++){
            String sub = s.substring(i, i + L);
            int hs = myHash(sub);
            if(table.containsKey(hs)){
                if(!table.get(hs)) list.add(sub);
                table.put(hs, true); //这部分是为了防止重复加入相同的字符串
            }else{
                table.put(hs, false);
            }
        }
        return list;
    }

chenxy132

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
leetcode笔记——187重复的DNA序列

题目：所有 DNA 由一系列缩写为 A，C，G 和 T 的核苷酸组成，例如：“ACGAATTCCG”。在研究 DNA 时，识别 DNA 中的重复序列有时会对研究非常有帮助。编写一个函数来查找 DNA 分子中所有出现超多一次的10个字母长的序列（子串）。示例:输入: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"输出: ["AAAAACCCCC",...
复制链接

扫一扫

专栏目录