LeetCode 686. Repeated String Match, 1. Rabin Karp's algorithm with Rolling Hash, 2. Brute Search

本文链接：https://blog.csdn.net/weixin_42919606/article/details/104565130

Problem

Repeat string $A$ for $n$ time to obtain $n A$ , determine whether it is possible to find such $n$ that a string $B$ is a substring of $n A$ . If possible, return the minimum value of integer $n$ ; otherwise, return -1.

For e.g. A = “abcd”, B = “cdabcdab” -> n = 3

Analysis

Search substring $B$ in $n A$
- Brute search: $O (m n)$
- Rabin Karp’s algorithm: $O (m n)$
- Rabin Karp’s algorithm with rolling hash: $O (m + (1 + q) n) = O (m + 2 n)$
  ( $m$ and $n$ are the length of pattern $B$ and text $n A$ , respectively)
Determine the minimum value of $n$
- if $A > = B$ , then $B$ can be a substring of $A$ , or $B$ can match the junction part of two $A$ . Thus, we only need to try $n = 1$ and $n = 2$ .
- if $A < B$ , consider put $B$ inside a $n A$ . If the head of $B$ locates in the second $A$ of $n A$ , then why not shifting B left to obtain $B$ in $(n - 1) A$ ? Thus, the length of $A$ must locates in $[B, A + B)$ for a minimized $n$ . In this range, $n$ is unique.

The total complexity of this problem is the complexity of pattern search algorithm.

Although Rabin Karp’s algorithm with rolling hash is much more efficient, you are more likely to make mistake in writing it. Thus, brute search is also a good idea for this problem; and in the following, I implemented both the two methods.

Notes on implementation

use long instead of int in hash calculation. This should also apply to $y = x^n$ , which is an intermediate index in rolling hash calculation.
In Java, % operator means remainder but not modulus i.e. -2 % 5 == -2 but $\mod 5 = 3$ . Therefore, you should ensure the number before % is positive. I guess this also applies for other programming langs.
Take as much % as possible to reduce the value, and this will not affect the result. As 4.
Some rules in modular arithmetic:
1. $(A\%p + B\%p)\%p = (A + B)\% p$
2. $(A*B)\%p = (A\%p * B\%p)\%p$

Using the two rules, we take some % in calculating y and H[i].

Save Running Time and Space: Some for loops can be combined; and do not use substring method in Java.
From Java6, substring method is no longer a wrapper of the original string, it creates a new string and takes $O (m)$ time. Using substring method will waste some space and requires another for loop (you can hash char one by one instead of extracting a substring first.)
This also applies for the substr method in C++. But it is not an essential problem.
Call PrecomputingHash() only one time. At first, I put this into a for loop, which waste me a lot of time to debug…

Java Implementation

Rabin Karp’s algorithm with rolling hash

class Solution {
    // This is the method 1 of 2: Rabin Karp's algorithm with rolling hash for LeetCode 686
      int repeatedStringMatch(String A, String B) {
        int count = 1;
        String A_prime = A; // duplicate an origin A

        // 1) A >= B -> B in A, or B in junction(A,B)
        if (A.length() >= B.length()) {
            if (Rabin_Karp(B,A))
                return 1;
            else {
                A = A_prime.repeat(2);
                if (Rabin_Karp(B,A))
                    return 2;
                else 
                    return -1;
            }
        }

        // 2) A < B: assume B in nA
        // if head of B in second A of nA, shift B left and n-- is okay
        // Thus, to minimize n, we have B <= nA < A + B, and n is unique.
        while (A.length() < B.length() + A_prime.length()) {
            count++;
            A = A_prime.repeat(count);
            if (A.length() >= B.length()) 
                if (Rabin_Karp(B, A)) 
                    return count;
        }
        return -1;
     }

    public static int p = 1000000007; // just using p people usually used
    public static int x = 31; // it says that the hash function inside Java uses 31.

    // Write this method with start_index to avoid using substring method, 
    // which creates useless new strings starting from Java 6.
    public static boolean areEqual(String pattern, String text, int start_index) {
        for (int i = 0; i < pattern.length(); i++) {
            if (pattern.charAt(i) != text.charAt(i + start_index))
                return false;
        }
        return true;
    }

    public static long PolyHash(String s) {
        long h = 0; // int would overflow, always using long in this class
        for (int i = s.length() - 1; i >= 0; --i)
            h = (h*x + s.charAt(i)) % p;
        return h;
    }

    // Rolling Hash 
    public static long[] PrecomputeHashes(String pattern, String text) {
        int m = pattern.length(), n = text.length();
        long[] H = new long[n-m+1]; 
        // Initial value of H.
        // Instead of using substring and call PolyHash, here we write PolyHash again
        // This saves a for loop and a unnecessary new string created by substring().
        H[n-m] = 0;
        for (int i = n - 1; i >= n - m; i--)
            H[n-m] = (H[n-m]*x + text.charAt(i)) % p;

        // An intermediate index (x^m) for rolling, here y = (x%p)^m = x^m % p
        long y = 1;
        for (int i = 1; i <= m; ++i)
            y = (y * x) % p;
        
        // H means the hash of substring text(i, i+m)
        for (int i = n - m - 1; i >= 0; --i) {
            H[i] = x * H[i+1] + text.charAt(i) - y * text.charAt(i+m) % p; 
            // H[i] can be negative, we did not taking mod now.
            while (H[i] < 0) 
                H[i] += p; // 3%5 != (-2)%5 in programming langs
            H[i] = H[i] % p;
        }
        return H;
    }

    // Rabin-Karp's algorithm
    private static boolean Rabin_Karp(String pattern, String text) { 
        int m = pattern.length(), n = text.length();
        long pHash = PolyHash(pattern); 

        long H[] = PrecomputeHashes(pattern, text); 
        // do not put this in the for loop

        for (int i = 0; i <= n - m; i++) {
            if (pHash != H[i])
                continue;
            if (areEqual(pattern, text, i)) // check collision
                return true;
        }
        return false;
    }

}

Brute search method

class Solution {
    // This is the method 2 of 2: Brute search for LeetCode 686.
    // Although it takes O(mn) time. You will less likely to have bugs in implementing it
    // comapred to the Rabin Karp's algorithm with rolling hash.
      int repeatedStringMatch(String A, String B) {
        int count = 1;
        String A_prime = A; // duplicate an origin A

        // 1) A >= B -> B in A, or B in junction(A,B)
        if (A.length() >= B.length()) {
            if (pattern_search(B,A))
                return 1;
            else {
                A = A_prime.repeat(2);
                if (pattern_search(B,A))
                    return 2;
                else 
                    return -1;
            }
        }

        // 2) A < B: assume B in nA
        // if head of B in second A of nA, shift B left and n-- is okay
        // Thus, to minimize n, we have B <= nA < A + B, and n is unique.
        while (A.length() < B.length() + A_prime.length()) {
            count++;
            A = A_prime.repeat(count);
            if (A.length() >= B.length()) 
                if (pattern_search(B, A)) 
                    return count;
        }
        return -1;
     }

     boolean pattern_search(String pattern, String text) {
        for (int i = 0; i <= text.length() - pattern.length(); i++) {
            // not < m - n, but <= m - n
            for (int j = 0; j < pattern.length(); j++) {
                if (pattern.charAt(j) != text.charAt(i+j)) 
                    break;
                if (j == pattern.length() - 1) // a substring found
                    return true;
            }
        }
        return false;
     }

 }

Performance and Summary

Performance
- Rabin Karp’s algorithm with rolling hash: 77ms (beat 75%)
- Brute search: 1824ms (beat 5%)
Summary
A easy problem, but easy to make mistakes in implementing Rabin Karp’s algorithm with rolling hash. Used a lot of time in debugging on rolling hash. They are presented in implementation notes.