利用散列函数进行字符串匹配

最新推荐文章于 2022-12-30 23:40:31 发布

PYB不开心

最新推荐文章于 2022-12-30 23:40:31 发布

阅读量746

点赞数

分类专栏：数据结构文章标签：散列函数

本文链接：https://blog.csdn.net/pp634077956/article/details/48162647

版权

数据结构专栏收录该内容

57 篇文章 1 订阅

订阅专栏

摘要：（1）假定给了我们一个长字符串L，给了一个我们已知长度的短字符串s，要求进行匹配，判断该s（长度k）是否在L（长度N）中，位置在哪里。
基本思路：对所有符合长度的短字符串T进行遍历(N-k+1次),求出该字符串的散列函数并且与s的散列函数进行比较，如果一致，那么就逐个字符的进行检查.(散列函数一致而不是同一字符的概率很小).
细节：关键是要迅速的求出散列函数值，对于字符串一般的思路是将所有字符的ASCII码累加然后对某个素数取余.如果我们对每一个短字符串T都逐个相加然后再取余，一次匹配所需的次数是O（K），那么与每一位都比较效率没有区别.x 需要改进为利用常数时间解决函数值的问题.
每个相邻的字符串s1s2s3s4s5s6s7…sN, s2s3s4s5…sN+1,中间重叠的部分只是变大了一个常数值，而对两头不同的地方进行修正.这只需要花费常数时间.

#include "stdafx.h"
#include "malloc.h"
#include "stdlib.h"
#include "string.h"
#include "math.h"
#define K 7
int Hash(char *s)
{
    int value = 0;
    for(int i = 0;i<=K-1;i++)
        value = value*8 + *(s+i);
    return value%67;
}

int _tmain(int argc, _TCHAR* argv[])
{

    char S[] = "this is a gorgeous girl,but I do not like her for she is too poor ";
    int N = sizeof(S);
    int value = 0;
    int x = pow((long double)8,7);
    char string[K+1];
    memcpy(string,S+5,sizeof(char)*K);
    string[K] = '\0';
    printf("the string we want is :  %s ",string);
    int DesiredValue = Hash(S+5);
    value = 0;
    int InitialValue = Hash(S);
    int CurrentValue = InitialValue;
    int t=0,m;
    char *pointFirst = S,*pointLast = S+K-1;
    value = 0;
    for(int i = 2;i<=N - K+1;i++)
    {   
      t = (8*CurrentValue%67+*(++pointLast)%67)%67;
      CurrentValue = (t - (*(pointFirst++) *x)%67)%67;
      if(CurrentValue < 0)
      CurrentValue+=67;
      if(CurrentValue == DesiredValue)
       {
            //比较
            int j = 0;
            while(j<=K-1&&string[j] == *(S+i+j-1))
            j++;
            if(j==K)
            printf("the goal has been found: the location is %d",i);
            else
            puts("the fake string with the same hashvalue");
        }
    }
    return 0;
}