LeetCode 686. Repeated String Match, 1. Rabin Karp's algorithm with Rolling Hash, 2. Brute Search

Problem

Repeat string A A A for n n n time to obtain n A nA nA, determine whether it is possible to find such n n n that a string B B B is a substring of n A nA nA. If possible, return the minimum value of integer n n n; otherwise, return -1.

For e.g. A = “abcd”, B = “cdabcdab” -> n = 3

Analysis

  • Search substring B B B in n A nA nA
    • Brute search: O ( m n ) O(mn) O(mn)
    • Rabin Karp’s algorithm: O ( m n ) O(mn) O(mn)
    • Rabin Karp’s algorithm with rolling hash: O ( m + ( 1 + q ) n ) = O ( m + 2 n ) O(m+(1+q)n) = O(m + 2n) O(m+(1+q)n)=O(m+2n)
      ( m m m and n n n are the length of pattern B B B and text n A nA nA, respectively)
  • Determine the minimum value of n n n
    • if A > = B A>=B A>=B, then B B B can be a substring of A A A, or B B B can match the junction part of two A A A. Thus, we only need to try n = 1 n=1 n=1 and n = 2 n=2 n=2.
    • if A < B A<B A<B, consider put B B B inside a n A nA nA. If the head of B B B locates in the second A A A of n A nA nA, then why not shifting B left to obtain B B B in ( n − 1 ) A (n-1)A (n1)A? Thus, the length of A A A must locates in [ B , A + B ) [B,A+B) [B,A+B) for a minimized n n n. In this range, n n n is unique.

The total complexity of this problem is the complexity of pattern search algorithm.

Although Rabin Karp’s algorithm with rolling hash is much more efficient, you are more likely to make mistake in writing it. Thus, brute search is also a good idea for this problem; and in the following, I implemented both the two methods.

Notes on implementation

  1. use long instead of int in hash calculation. This should also apply to y = x n y = x^n y=xn, which is an intermediate index in rolling hash calculation.
  2. In Java, % operator means remainder but not modulus i.e. -2 % 5 == -2 but − 2 m o d    5 = 3 -2 \mod 5 = 3 2mod5=3. Therefore, you should ensure the number before % is positive. I guess this also applies for other programming langs.
  3. Take as much % as possible to reduce the value, and this will not affect the result. As 4.
  4. Some rules in modular arithmetic:
    1. ( A % p + B % p ) % p = ( A + B ) % p (A\%p + B\%p)\%p = (A + B)\% p (A%p+B%p)%p=(A+B)%p
    2. ( A ∗ B ) % p = ( A % p ∗ B % p ) % p (A*B)\%p = (A\%p * B\%p)\%p (AB)%p=(A%pB%p)%p

Using the two rules, we take some % in calculating y and H[i].

  1. Save Running Time and Space: Some for loops can be combined; and do not use substring method in Java.
    From Java6, substring method is no longer a wrapper of the original string, it creates a new string and takes O ( m ) O(m) O(m) time. Using substring method will waste some space and requires another for loop (you can hash char one by one instead of extracting a substring first.)
    This also applies for the substr method in C++. But it is not an essential problem.

  2. Call PrecomputingHash() only one time. At first, I put this into a for loop, which waste me a lot of time to debug…

Java Implementation

  • Rabin Karp’s algorithm with rolling hash
class Solution {
    // This is the method 1 of 2: Rabin Karp's algorithm with rolling hash for LeetCode 686
      int repeatedStringMatch(String A, String B) {
        int count = 1;
        String A_prime = A; // duplicate an origin A

        // 1) A >= B -> B in A, or B in junction(A,B)
        if (A.length() >= B.length()) {
            if (Rabin_Karp(B,A))
                return 1;
            else {
                A = A_prime.repeat(2);
                if (Rabin_Karp(B,A))
                    return 2;
                else 
                    return -1;
            }
        }

        // 2) A < B: assume B in nA
        // if head of B in second A of nA, shift B left and n-- is okay
        // Thus, to minimize n, we have B <= nA < A + B, and n is unique.
        while (A.length() < B.length() + A_prime.length()) {
            count++;
            A = A_prime.repeat(count);
            if (A.length() >= B.length()) 
                if (Rabin_Karp(B, A)) 
                    return count;
        }
        return -1;
     }

    public static int p = 1000000007; // just using p people usually used
    public static int x = 31; // it says that the hash function inside Java uses 31.

    // Write this method with start_index to avoid using substring method, 
    // which creates useless new strings starting from Java 6.
    public static boolean areEqual(String pattern, String text, int start_index) {
        for (int i = 0; i < pattern.length(); i++) {
            if (pattern.charAt(i) != text.charAt(i + start_index))
                return false;
        }
        return true;
    }

    public static long PolyHash(String s) {
        long h = 0; // int would overflow, always using long in this class
        for (int i = s.length() - 1; i >= 0; --i)
            h = (h*x + s.charAt(i)) % p;
        return h;
    }

    // Rolling Hash 
    public static long[] PrecomputeHashes(String pattern, String text) {
        int m = pattern.length(), n = text.length();
        long[] H = new long[n-m+1]; 
        // Initial value of H.
        // Instead of using substring and call PolyHash, here we write PolyHash again
        // This saves a for loop and a unnecessary new string created by substring().
        H[n-m] = 0;
        for (int i = n - 1; i >= n - m; i--)
            H[n-m] = (H[n-m]*x + text.charAt(i)) % p;

        // An intermediate index (x^m) for rolling, here y = (x%p)^m = x^m % p
        long y = 1;
        for (int i = 1; i <= m; ++i)
            y = (y * x) % p;
        
        // H means the hash of substring text(i, i+m)
        for (int i = n - m - 1; i >= 0; --i) {
            H[i] = x * H[i+1] + text.charAt(i) - y * text.charAt(i+m) % p; 
            // H[i] can be negative, we did not taking mod now.
            while (H[i] < 0) 
                H[i] += p; // 3%5 != (-2)%5 in programming langs
            H[i] = H[i] % p;
        }
        return H;
    }

    // Rabin-Karp's algorithm
    private static boolean Rabin_Karp(String pattern, String text) { 
        int m = pattern.length(), n = text.length();
        long pHash = PolyHash(pattern); 

        long H[] = PrecomputeHashes(pattern, text); 
        // do not put this in the for loop

        for (int i = 0; i <= n - m; i++) {
            if (pHash != H[i])
                continue;
            if (areEqual(pattern, text, i)) // check collision
                return true;
        }
        return false;
    }

}
  • Brute search method
class Solution {
    // This is the method 2 of 2: Brute search for LeetCode 686.
    // Although it takes O(mn) time. You will less likely to have bugs in implementing it
    // comapred to the Rabin Karp's algorithm with rolling hash.
      int repeatedStringMatch(String A, String B) {
        int count = 1;
        String A_prime = A; // duplicate an origin A

        // 1) A >= B -> B in A, or B in junction(A,B)
        if (A.length() >= B.length()) {
            if (pattern_search(B,A))
                return 1;
            else {
                A = A_prime.repeat(2);
                if (pattern_search(B,A))
                    return 2;
                else 
                    return -1;
            }
        }

        // 2) A < B: assume B in nA
        // if head of B in second A of nA, shift B left and n-- is okay
        // Thus, to minimize n, we have B <= nA < A + B, and n is unique.
        while (A.length() < B.length() + A_prime.length()) {
            count++;
            A = A_prime.repeat(count);
            if (A.length() >= B.length()) 
                if (pattern_search(B, A)) 
                    return count;
        }
        return -1;
     }

     boolean pattern_search(String pattern, String text) {
        for (int i = 0; i <= text.length() - pattern.length(); i++) {
            // not < m - n, but <= m - n
            for (int j = 0; j < pattern.length(); j++) {
                if (pattern.charAt(j) != text.charAt(i+j)) 
                    break;
                if (j == pattern.length() - 1) // a substring found
                    return true;
            }
        }
        return false;
     }

 }

Performance and Summary

  • Performance

    • Rabin Karp’s algorithm with rolling hash: 77ms (beat 75%)
    • Brute search: 1824ms (beat 5%)
  • Summary
    A easy problem, but easy to make mistakes in implementing Rabin Karp’s algorithm with rolling hash. Used a lot of time in debugging on rolling hash. They are presented in implementation notes.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值