Repeated DNA Sequences

最新推荐文章于 2016-06-01 10:06:30 发布

Nereus_Li

最新推荐文章于 2016-06-01 10:06:30 发布

阅读量632

点赞数

分类专栏： LeetCode

本文链接：https://blog.csdn.net/li_chihang/article/details/44024753

版权

LeetCode 专栏收录该内容

113 篇文章 0 订阅

订阅专栏

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

#include<iostream>
#include<vector>
#include<map>
#include<string>
using namespace std;

//用map来做，结果： Memory Limit Exceeded
vector<string> findRepeatedDnaSequences(string s) {
	vector<string>   ResultString;
	map<string, int> MapStringCount;
	if (s.size()<=10)
		return ResultString;
	for (int i = 0; i != s.size() - 10;++i){
		if (!MapStringCount.count(s.substr(i, 10)))
			MapStringCount.insert(make_pair(s.substr(i, 10), 1));
		else
		{
			if (MapStringCount[s.substr(i, 10)] == 1)
				ResultString.push_back(s.substr(i, 10));
			MapStringCount[s.substr(i, 10)]++;
		}
			
	}
	return ResultString;
}

//改进hashkey
//利用位计算来实现hashkey
int myhashkey(string s)
{
	int n = 0;
	for (int i = 0; i != s.size();++i)
	{
		n <<= 2;
		if (s[i]=='C')
			n += 1;
		else if (s[i] == 'G')
			n += 2;
		else if (s[i] == 'T')
			n += 3;	
	}
	return n;
}
vector<string> findRepeatedDnaSequences(string s) {
	vector<string>   ResultString;
	map<int, int> MapStringCount;
	if (s.size() <= 10)
		return ResultString;
	for (int i = 0; i <= s.size() - 10; ++i){
		if (!MapStringCount.count(myhashkey(s.substr(i, 10))))
			MapStringCount.insert(make_pair(myhashkey(s.substr(i, 10)), 1));
		else
		{
			if (MapStringCount[myhashkey(s.substr(i, 10))] == 1)
				ResultString.push_back(s.substr(i, 10));
			MapStringCount[myhashkey(s.substr(i, 10))]++;
		}

	}
	return ResultString;
}

Nereus_Li

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.Write
复制链接

扫一扫

专栏目录