Repeated DNA Sequences(统计字符串出现次数)

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Solutions:

因为只出现4个字符,可以巧用2位二进制位来代表它们。10个字符可以用20位来代表。用哈希映射。

class Solution {
public:
	unsigned int getInt(char c) {
		switch (c) {
		case 'A': return 0;
		case 'C': return 1;
		case 'G': return 2;
		case 'T': return 3;
	//	default: return -1;
		}
	}
    vector<string> findRepeatedDnaSequences(string s) {
		vector<string> result;
        if(s.size() <= 10) {
			return result;
		}
		unsigned int toInt=0;
		int i=0,j=0;
		for( i=0; i<=9; ++i){
			//<< must be the first,or wrong
			toInt <<= 2;
			toInt |= getInt(s[i]);
			//toInt <<= 2;
		}
/*		int *mapArr=new int[1024*1024];//exceed space limitation
		memset(mapArr, 0, sizeof(int)*(1024*1024));		
		mapArr[toInt]++;
*/
		map<int, int> mapArr;
		mapArr[toInt]++;
		for( i=10; i<s.size(); ++i) {			
			toInt <<= 14;
			toInt >>= 12;
			toInt |= getInt(s[i]);
			mapArr[toInt]++;
			if(mapArr[toInt] > 1) {
				string temp=s.substr(i-9, 10);
				bool flag=false;
				for(vector<string>::iterator itr=result.begin(); itr!=result.end(); ++itr) {
					if(*itr == temp) {
						flag=true;
						break;
					}
				}
				if(flag == false) {
					result.push_back(temp);
				}
			}
		}
		return result;
    }
};



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值