【Rabin-Karp】字符串查找strStrⅡ_rabin2搜索字符串-CSDN博客

本文链接：https://blog.csdn.net/phdongou/article/details/109591008

1、为什么使用Rabin-Karp?

1、KMP算法复杂，且基本上只解决字符串查找（在一个字符串中查找另一个字符串）一类问题，时间复杂度为O(n+m)，n和m为两字符串长度。而且很难理解为什么代码那样写，想要不出错的写出来其实是有些困难的，所以可使用较简单的Rabin-Karp。
2、若用普通双for循环实现的话，时间复杂度为O(n^2)。

2、Rabin-Karp的基本思路简介

普通方法在一个字符串source中查找另一个字符串target。比如source为abcde，target为cde，那我们会先用cde与abc做对比，然后再从abc移动到bcd，再比较bcd与cde是否相等。这其中source中字符串的移动会浪费O(n)，bcd与cde逐一比较会浪费o(m)。整个会O(n*m)。于是我们想要优化abc和cde比较的这个过程，让比较环节的复杂度其变为O(1)，那么可以利用hash函数，将abc转化为一个数字，cde也转化为1个数字，再source移动进行比较。
Rabin-KMP基本思路

3、通过hash函数将字符串转化为数字

abcde = (a*31^4 + b*31^3 + c*31^2 + d*31^1 + e*31^0)%10^6

从abc移动到bcd的hash函数计算

这个过程相当于加上d移去a，这样可以在O(1)的时间计算出bcd的hash值。

bcd = [(x*31 + d) % 10^6 - (a*31^3) % 10^6] % 10^6

这里x为abc的hash值，此外因为里面有减法的运算，这样计算出来的hash值如果小于0的话，那就加上10^6。
abc到bcd的hash值计算

4、例题

LintCode 594. 字符串查找Ⅱ

https://www.lintcode.com/problem/strstr-ii/description
描述：
实现时间复杂度为 O(n + m)的方法 strStr。
strStr 返回目标字符串在源字符串中第一次出现的第一个字符的位置. 目标字串的长度为 m , 源字串的长度为 n . 如果目标字串不在源字串中则返回 -1。

代码：

class Solution {
public:
    /*
     * @param source: A source string
     * @param target: A target string
     * @return: An integer as index
     */
    #define BASE 1000000
    
    int strStr2(const char* source, const char* target) {
        if(source == NULL || target == NULL){
            return -1;
        }
        
        int n = strlen(source);
        int m = strlen(target);
        if(m == 0){
            return 0;
        }

        //hashvalue of target
        // cde
        // ^
        int targethash = 0;
        for(int i = 0; i < m; ++i){
            targethash = (targethash * 31 + target[i]) % BASE;
        }
        
        // 31^m
        int power_m = 1;
        for(int i = 0; i < m; ++i){
            power_m = (power_m * 31) % BASE;
        }
        
        //compare hashvalue of source and target
        // abcdef
        // cde
        //  cde
        //    ->
        int sourcehash = 0;
        for(int j = 0; j < n; ++j){
            sourcehash = (sourcehash * 31 + source[j]) % BASE;
            if(j < m - 1){
                continue;
            }
            if(j >= m){
                //a%10^6 -> ((a%10^6)*31 + b)%10^6 -> ((((a%10^6)*31 + b)%10^6)*31 + c)%10^6
                //abc = a*31^2 + b*31^1 + c*31^0
                sourcehash = sourcehash - (source[j - m] * power_m) % BASE;
                if(sourcehash < 0){
                    sourcehash = sourcehash + BASE;
                }
            }
            int i, k;
            if(sourcehash == targethash){
                for(i = j - m + 1, k = 0; i <= j; ++i){
                    if(source[i] != target[k++]){
                        break;
                    }
                }
                if(k == m){
                    return j - m + 1;
                }
            }
        }
        return -1;
    }
};