字符串hash

Crazy Search  题目链接

Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.

As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5.

Input

The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.

Output

The program should output just an integer corresponding to the number of different substrings of size N found in the given text.

Sample Input

3 4
daababac

Sample Output

5

Hint

Huge input,scanf is recommended.

题目大意就是将一个字符串分成长度为N的子串。且不同的字符不会超过NC个。问总共有多少个不同的子串。最后采用的办法就是以NC作为进制,把一个字符子串转换为这个进制下的数,再用哈希判断。由于题目说长度不会超过16000000,所以哈希长度就设为16000000就行。另外为每一个字符对应一个整数,来方便转化。

    比如题目中的daababac与整数对应之后就是12232324,然后子串分别可以转换为下列各数:
    daa->122->011(因为是化为4进制,所以需要减1)->5(将转换后的4进制数计算为10进制作为此子串的hash索引值);
    aab->223->112->22;

    aba->232->121->25;
    时间复杂度为O(n)。借鉴代码如下所示:

这个题用unsigned long long 会超内存,用unsigned int 能过

 

#include <iostream>
#include <string.h>
using namespace std;

#define mem(a) memset(a, 0, sizeof(a))
typedef unsigned long long ull;
typedef unsigned int ui;

ui has[16000000 + 5];
ui a[128];
char str[1000000];


int main() {
	int n, base;//每个子串有n个字符
	while (cin >> n >> base) {
		mem(str);
		mem(a);
		mem(has)
		cin >> str;
		int num = 0;
		int k = 0;
		int temp = 1;
		int len = strlen(str);
		for (int i = 0; i < len; i++) {
			if (a[str[i]] == 0) {
				a[str[i]] = ++k;
			}
			if (k == base) {
				break;
			}
		}
		for (int i = 0; i < n; i++) {
			num = num * base + a[str[i]] - 1;
			temp *= base;
		}
		temp /= base;
		has[num] = 1;
		int cnt = 1;
		for (int i = 1; i <= len - n; i ++) {
			num = (num - (a[str[i - 1]] - 1) * temp) * base + a[str[i + n - 1]] - 1;
			if (!has[num]) {
				has[num] = 1;
				cnt++;
			}
		}
		cout << cnt << endl;
	}
	return 0;
}

下面这个是不用hash做的  不知道为啥runtime error

//不是用hash做的 
#include<iostream>
#include<string.h>
#include<cstdio>
#define mem(a) memset(a,0,sizeof a)
#define ull unsigned long long 
typedef unsigned int ui;
using namespace std;

ui has[16000000+13];
ui a[128];
char str[1000000];
int prime=233;//用这个也不行 

int main()
{
    int n,m;
    while(~scanf("%d%d%s",&n,&m,str+1)) {
    	int k=0;
    	mem(a);
    	mem(has);
    	int cnt=0;
        int len=strlen(str+1);
        for(int i=1;i<=len;i++){
        	if(a[str[i]]==0)
            a[str[i]]=++k;
        }
	    for(int i=1;i<=len-n;i++){
			int ans=0;
			for(int j=0;j<=n;j++){
				ans=ans*m+a[str[i+j]];//把m换成prime会出错	
			}
			if(!has[ans]){//has[ans]==0
				has[ans]=1;
				cnt++;
			}
		}
        printf("%d\n",cnt);
    }
    return 0;
}

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值