LeetCode——Substring with Concatenation of All Words

  • 题目
    You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

    For example, given:
    s: “barfoothefoobarman”
    words: [“foo”, “bar”]

    You should return the indices: [0,9].
    (order does not matter).

    Subscribe to see which companies asked this question.

  • 解法1
    注:n = s.length(), m = words.size(), len = words.size() == 0 ? 0 : words[0].length()
    首先将words中的所有单词,加入hashmap中,这样对于s中的每个位置i,花O(len)取得子串s[i, i + len]后,就可以花O(1)来确定是否在words中。
    然后找到区间[l, r),满足[l, l + len), [l + len, l + 2*len)……子串均出现在words中,而且r - l = m * len,那么l是答案中的一个。
    此算法最低复杂度为O(n*len)

class Solution {
    std::string substr(std::string& s, int beginIndex, int len) {
        if (beginIndex < 0 || beginIndex + len > s.length())
            return "";
        return s.substr(beginIndex, len);
    }
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        int len = words.size() == 0 ? 0 : words[0].length();
        std::vector<int> ret;
        unordered_map<string, int> mp, tmp;
        for (int i = 0; i < words.size(); i++)
            mp[words[i]]++;
        for (int i = 0; i < len; i++) {
            tmp.clear();
            int l = i, r = i;
            while (r + len <= s.length()) {
                while (r + len <= s.length() && tmp[substr(s, r, len)] + 1 <= mp[substr(s, r, len)]) {
                    tmp[substr(s, r, len)]++;
                    r += len;
                }
                if (r - l == words.size() * len && (ret.size() == 0 || ret[ret.size() - 1] != l))
                    ret.push_back(l);
                if (r + len <= s.length()) {
                    tmp[substr(s, r, len)]++;
                    r += len;
                }

                while (l + len <= r && tmp[substr(s, l, len)] <= mp[substr(s, l, len)]) {
                    tmp[substr(s, l, len)]--;
                    l += len;
                }
                if (l + len <= r) {
                    tmp[substr(s, l, len)]--;
                    l += len;
                }
                if (r - l == words.size() * len)
                    ret.push_back(l);
            }
        }
        return ret;
    }
};
  • 解法2
    采用两种hash,记为h1和h2。h1: string -> int,h2: [int] -> int。
    首先对于words中的每一个字符串,使用h1,再使用h2,得到words的最终hash值。
    那么对于s[i, i + m * len],先使用f1,计算出s[i, i + len], s[i + len, i + 2*len]……,然后再使用f2得到最终hash值,与words的hash值进行比较,相同认为匹配,否则认为失败。
    为了效率,应当选取可以递推求解的hash函数(即知道h2(h1(str))后,要求str删除左边第一个字符,在右边加上一个字符的新串str’时,复杂度为O(1))。

    注:转自LeetCode

    class Solution {
    // The general idea:
    // Construct a hash function f for L, f: vector<string> -> int, 
    // Then use the return value of f to check whether a substring is a concatenation 
    // of all words in L.
    // f has two levels, the first level is a hash function f1 for every single word in L.
    // f1 : string -> double
    // So with f1, L is converted into a vector of float numbers
    // Then another hash function f2 is defined to convert a vector of doubles into a single int.
    // Finally f(L) := f2(f1(L))
    // To obtain lower complexity, we require f1 and f2 can be computed through moving window.
    // The following corner case also needs to be considered:
    // f2(f1(["ab", "cd"])) != f2(f1(["ac", "bd"]))
    // There are many possible options for f2 and f1. 
    // The following code only shows one possibility (probably not the best), 
    // f2 is the function "hash" in the class,
    // f1([a1, a2, ... , an]) := int( decimal_part(log(a1) + log(a2) + ... + log(an)) * 1000000000 )
    public:
    // The complexity of this function is O(nW).
    double hash(double f, double code[], string &word) {
        double result = 0.;
        for (auto &c : word) result = result * f + code[c];
        return result;
    }
    vector<int> findSubstring(string S, vector<string> &L) {
        uniform_real_distribution<double> unif(0., 1.);
        default_random_engine seed;
        double code[128];
        for (auto &d : code) d = unif(seed);
        double f = unif(seed) / 5. + 0.8;
        double value = 0;
    
        // The complexity of the following for loop is O(L.size( ) * nW).
        for (auto &str : L) value += log(hash(f, code, str));
    
        int unit = 1e9;
        int key = (value-floor(value))*unit;
        int nS = S.size(), nL = L.size(), nW = L[0].size();
        double fn = pow(f, nW-1.);
        vector<int> result;
        if (nS < nW) return result;
        vector<double> values(nS-nW+1);
        string word(S.begin(), S.begin()+nW);
        values[0] = hash(f, code, word);
    
        // Use a moving window to hash every word with length nW in S to a float number, 
        // which is stored in vector values[]
        // The complexity of this step is O(nS).
        for (int i=1; i<=nS-nW; ++i) values[i] = (values[i-1] - code[S[i-1]]*fn)*f + code[S[i+nW-1]];
    
        // This for loop will run nW times, each iteration has a complexity O(nS/nW)
        // So the overall complexity is O(nW * (nS / nW)) = O(nS)
        for (int i=0; i<nW; ++i) {
            int j0=i, j1=i, k=0;
            double sum = 0.;
    
            // Use a moving window to hash every L.size() continuous words with length nW in S.
            // This while loop will terminate within nS/nW iterations since the increasement of j1 is nW,
            // So the complexity of this while loop is O(nS / nW).
            while(j1<=nS-nW) {
                sum += log(values[j1]);
                ++k;
                j1 += nW;
                if (k==nL) {
                    int key1 = (sum-floor(sum)) * unit;
                    if (key1==key) result.push_back(j0);
                    sum -= log(values[j0]);
                    --k;
                    j0 += nW;
                }
            }
        }
        return result;
    }
    };
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值