Karp-Rabin字符串匹配算法

最新推荐文章于 2024-07-23 17:36:37 发布

有梦丶

最新推荐文章于 2024-07-23 17:36:37 发布

阅读量75

点赞数

文章标签：哈希算法散列表算法数据结构 c++

本文链接：https://blog.csdn.net/youmeng2/article/details/131603948

版权

思想：将特定字符串用hash散列的方式映射到一个整数，两串匹配时就不用再逐个字符比较，而是进行整数间的比较，可以在O(1)的时间完成。

假设有文本串t[0, n)，模式串p[0, m)， n > m

采用的hash函数：（其中b为一个提前设定好的底数）

$Hash(t[0, m)) = t[0]*b^{m - 1} + t[1]*b^{m-2}+......+t[m-1]*b^{0}$

$Hash(p[0, m)) = p[0]*b^{m - 1} + p[1]*b^{m-2}+......+p[m-1]*b^{0}$

为什么这里只计算t的前m位呢？因为计算一次hash值需要O(m)的时间，如果在匹配过程中，每次都用循环计算一遍，复杂度将上升到O(m)，整体复杂度上升至O(n * m)，与蛮力算法相当。但是先计算一次，再采用滚动hash的方式（O(1))来更新t的子串的hash值，就可以把复杂度控制在 O(n + m)，即线性范围内。

所谓滚动hash，拿 t[0, m)举例，下一个m长度的子串为t[1, m+1)，两个子串只有 t[0]和t[m]不同，相当于减去一个 t[0]，加回一个 t[m]，当然中间部分(t[1 ~ m - 1]要各乘一个b，即使用O(1)时间完成了由hash(t[i,m))到hash(t[i + 1], m + 1) 的转换。

代码如下：

#include<iostream>
#include<cstring>
#include<cmath>
using namespace std;
const int B = 256, M = 97, N = 100;     // 3个数都可更改，B为底数，M为散列表长度，N为串规模
char p[N], t[N];

void RabinKarp(char* t, char* p){
    int t_len = strlen(t), p_len = strlen(p);

    //这里要计算 b^(m - 1)，因为害怕溢出，所以对M取模
    int h = 1;
    for(int i = 0; i < p_len - 1; i++)
        h = (B * h) % M;

    int t_hash = 0, p_hash = 0;
    //计算前 p_len 个字符的hash
    for(int i = 0; i < p_len; i++){
        t_hash = ( B * t_hash + t[i]) % M;
        p_hash = ( B * p_hash + p[i]) % M;
    }

    for(int i = 0; i <= t_len - p_len; i++){

        //如果hash值一样，再花 O(m)的时间比对值是否相同，因为不同的值可能映射到同一个hash地址（即hash冲突）
        if(t_hash == p_hash && memcmp(t + i, p, p_len) == 0)
            cout << "find index i:" << i << endl;
        
        //滚动更新，O(1)复杂度
        t_hash = ( (t_hash - t[i] * h ) * B + t[i + p_len]) % M;

        //确保hash值始终为正
        if(t_hash < 0)
            t_hash = t_hash + M;
    }


}

int main(){
    strcpy(t, "this is a test, but not just a test");
    strcpy(p, "test");

    RabinKarp(t, p);

    return 0;
}

运行结果：