Crazy Search 题目链接
Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.
As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5.
Input
The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.
Output
The program should output just an integer corresponding to the number of different substrings of size N found in the given text.
Sample Input
3 4
daababac
Sample Output
5
Hint
Huge input,scanf is recommended.
题目大意就是将一个字符串分成长度为N的子串。且不同的字符不会超过NC个。问总共有多少个不同的子串。最后采用的办法就是以NC作为进制,把一个字符子串转换为这个进制下的数,再用哈希判断。由于题目说长度不会超过16000000,所以哈希长度就设为16000000就行。另外为每一个字符对应一个整数,来方便转化。
比如题目中的daababac与整数对应之后就是12232324,然后子串分别可以转换为下列各数:
daa->122->011(因为是化为4进制,所以需要减1)->5(将转换后的4进制数计算为10进制作为此子串的hash索引值);
aab->223->112->22;
aba->232->121->25;
时间复杂度为O(n)。借鉴代码如下所示:
这个题用unsigned long long 会超内存,用unsigned int 能过
#include <iostream>
#include <string.h>
using namespace std;
#define mem(a) memset(a, 0, sizeof(a))
typedef unsigned long long ull;
typedef unsigned int ui;
ui has[16000000 + 5];
ui a[128];
char str[1000000];
int main() {
int n, base;//每个子串有n个字符
while (cin >> n >> base) {
mem(str);
mem(a);
mem(has)
cin >> str;
int num = 0;
int k = 0;
int temp = 1;
int len = strlen(str);
for (int i = 0; i < len; i++) {
if (a[str[i]] == 0) {
a[str[i]] = ++k;
}
if (k == base) {
break;
}
}
for (int i = 0; i < n; i++) {
num = num * base + a[str[i]] - 1;
temp *= base;
}
temp /= base;
has[num] = 1;
int cnt = 1;
for (int i = 1; i <= len - n; i ++) {
num = (num - (a[str[i - 1]] - 1) * temp) * base + a[str[i + n - 1]] - 1;
if (!has[num]) {
has[num] = 1;
cnt++;
}
}
cout << cnt << endl;
}
return 0;
}
下面这个是不用hash做的 不知道为啥runtime error
//不是用hash做的
#include<iostream>
#include<string.h>
#include<cstdio>
#define mem(a) memset(a,0,sizeof a)
#define ull unsigned long long
typedef unsigned int ui;
using namespace std;
ui has[16000000+13];
ui a[128];
char str[1000000];
int prime=233;//用这个也不行
int main()
{
int n,m;
while(~scanf("%d%d%s",&n,&m,str+1)) {
int k=0;
mem(a);
mem(has);
int cnt=0;
int len=strlen(str+1);
for(int i=1;i<=len;i++){
if(a[str[i]]==0)
a[str[i]]=++k;
}
for(int i=1;i<=len-n;i++){
int ans=0;
for(int j=0;j<=n;j++){
ans=ans*m+a[str[i+j]];//把m换成prime会出错
}
if(!has[ans]){//has[ans]==0
has[ans]=1;
cnt++;
}
}
printf("%d\n",cnt);
}
return 0;
}