摘要:(1)假定给了我们一个长字符串L,给了一个我们已知长度的短字符串s,要求进行匹配,判断该s(长度k)是否在L(长度N)中,位置在哪里。
基本思路:对所有符合长度的短字符串T进行遍历(N-k+1次),求出该字符串的散列函数并且与s的散列函数进行比较,如果一致,那么就逐个字符的进行检查.(散列函数一致而不是同一字符的概率很小).
细节:关键是要迅速的求出散列函数值,对于字符串一般的思路是将所有字符的ASCII码累加然后对某个素数取余.如果我们对每一个短字符串T都逐个相加然后再取余,一次匹配所需的次数是O(K),那么与每一位都比较效率没有区别.x 需要改进为利用常数时间解决函数值的问题.
每个相邻的字符串s1s2s3s4s5s6s7…sN, s2s3s4s5…sN+1,中间重叠的部分只是变大了一个常数值,而对两头不同的地方进行修正.这只需要花费常数时间.
#include "stdafx.h"
#include "malloc.h"
#include "stdlib.h"
#include "string.h"
#include "math.h"
#define K 7
int Hash(char *s)
{
int value = 0;
for(int i = 0;i<=K-1;i++)
value = value*8 + *(s+i);
return value%67;
}
int _tmain(int argc, _TCHAR* argv[])
{
char S[] = "this is a gorgeous girl,but I do not like her for she is too poor ";
int N = sizeof(S);
int value = 0;
int x = pow((long double)8,7);
char string[K+1];
memcpy(string,S+5,sizeof(char)*K);
string[K] = '\0';
printf("the string we want is : %s ",string);
int DesiredValue = Hash(S+5);
value = 0;
int InitialValue = Hash(S);
int CurrentValue = InitialValue;
int t=0,m;
char *pointFirst = S,*pointLast = S+K-1;
value = 0;
for(int i = 2;i<=N - K+1;i++)
{
t = (8*CurrentValue%67+*(++pointLast)%67)%67;
CurrentValue = (t - (*(pointFirst++) *x)%67)%67;
if(CurrentValue < 0)
CurrentValue+=67;
if(CurrentValue == DesiredValue)
{
//比较
int j = 0;
while(j<=K-1&&string[j] == *(S+i+j-1))
j++;
if(j==K)
printf("the goal has been found: the location is %d",i);
else
puts("the fake string with the same hashvalue");
}
}
return 0;
}