LeetCode——Substring with Concatenation of All Words

最新推荐文章于 2021-01-05 17:33:28 发布

_hehe_

最新推荐文章于 2021-01-05 17:33:28 发布

阅读量256

点赞数 1

分类专栏：其他——思维训练文章标签： leetcode

本文链接：https://blog.csdn.net/wty__/article/details/62228680

版权

其他——思维训练专栏收录该内容

34 篇文章 0 订阅

订阅专栏

题目
You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

For example, given:
s: “barfoothefoobarman”
words: [“foo”, “bar”]

You should return the indices: [0,9].
(order does not matter).

Subscribe to see which companies asked this question.
解法1
注：n = s.length(), m = words.size(), len = words.size() == 0 ? 0 : words[0].length()
首先将words中的所有单词，加入hashmap中，这样对于s中的每个位置i，花O(len)取得子串s[i, i + len]后，就可以花O(1)来确定是否在words中。
然后找到区间[l, r)，满足[l, l + len), [l + len, l + 2*len)……子串均出现在words中，而且r - l = m * len，那么l是答案中的一个。
此算法最低复杂度为O(n*len)

class Solution {
    std::string substr(std::string& s, int beginIndex, int len) {
        if (beginIndex < 0 || beginIndex + len > s.length())
            return "";
        return s.substr(beginIndex, len);
    }
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        int len = words.size() == 0 ? 0 : words[0].length();
        std::vector<int> ret;
        unordered_map<string, int> mp, tmp;
        for (int i = 0; i < words.size(); i++)
            mp[words[i]]++;
        for (int i = 0; i < len; i++) {
            tmp.clear();
            int l = i, r = i;
            while (r + len <= s.length()) {
                while (r + len <= s.length() && tmp[substr(s, r, len)] + 1 <= mp[substr(s, r, len)]) {
                    tmp[substr(s, r, len)]++;
                    r += len;
                }
                if (r - l == words.size() * len && (ret.size() == 0 || ret[ret.size() - 1] != l))
                    ret.push_back(l);
                if (r + len <= s.length()) {
                    tmp[substr(s, r, len)]++;
                    r += len;
                }

                while (l + len <= r && tmp[substr(s, l, len)] <= mp[substr(s, l, len)]) {
                    tmp[substr(s, l, len)]--;
                    l += len;
                }
                if (l + len <= r) {
                    tmp[substr(s, l, len)]--;
                    l += len;
                }
                if (r - l == words.size() * len)
                    ret.push_back(l);
            }
        }
        return ret;
    }
};

解法2
采用两种hash，记为h1和h2。h1: string -> int，h2: [int] -> int。
首先对于words中的每一个字符串，使用h1，再使用h2，得到words的最终hash值。
那么对于s[i, i + m * len]，先使用f1，计算出s[i, i + len], s[i + len, i + 2*len]……，然后再使用f2得到最终hash值，与words的hash值进行比较，相同认为匹配，否则认为失败。
为了效率，应当选取可以递推求解的hash函数（即知道h2(h1(str))后，要求str删除左边第一个字符，在右边加上一个字符的新串str’时，复杂度为O(1)）。

注：转自LeetCode

class Solution {
// The general idea:
// Construct a hash function f for L, f: vector<string> -> int, 
// Then use the return value of f to check whether a substring is a concatenation 
// of all words in L.
// f has two levels, the first level is a hash function f1 for every single word in L.
// f1 : string -> double
// So with f1, L is converted into a vector of float numbers
// Then another hash function f2 is defined to convert a vector of doubles into a single int.
// Finally f(L) := f2(f1(L))
// To obtain lower complexity, we require f1 and f2 can be computed through moving window.
// The following corner case also needs to be considered:
// f2(f1(["ab", "cd"])) != f2(f1(["ac", "bd"]))
// There are many possible options for f2 and f1. 
// The following code only shows one possibility (probably not the best), 
// f2 is the function "hash" in the class,
// f1([a1, a2, ... , an]) := int( decimal_part(log(a1) + log(a2) + ... + log(an)) * 1000000000 )
public:
// The complexity of this function is O(nW).
double hash(double f, double code[], string &word) {
    double result = 0.;
    for (auto &c : word) result = result * f + code[c];
    return result;
}
vector<int> findSubstring(string S, vector<string> &L) {
    uniform_real_distribution<double> unif(0., 1.);
    default_random_engine seed;
    double code[128];
    for (auto &d : code) d = unif(seed);
    double f = unif(seed) / 5. + 0.8;
    double value = 0;

    // The complexity of the following for loop is O(L.size( ) * nW).
    for (auto &str : L) value += log(hash(f, code, str));

    int unit = 1e9;
    int key = (value-floor(value))*unit;
    int nS = S.size(), nL = L.size(), nW = L[0].size();
    double fn = pow(f, nW-1.);
    vector<int> result;
    if (nS < nW) return result;
    vector<double> values(nS-nW+1);
    string word(S.begin(), S.begin()+nW);
    values[0] = hash(f, code, word);

    // Use a moving window to hash every word with length nW in S to a float number, 
    // which is stored in vector values[]
    // The complexity of this step is O(nS).
    for (int i=1; i<=nS-nW; ++i) values[i] = (values[i-1] - code[S[i-1]]*fn)*f + code[S[i+nW-1]];

    // This for loop will run nW times, each iteration has a complexity O(nS/nW)
    // So the overall complexity is O(nW * (nS / nW)) = O(nS)
    for (int i=0; i<nW; ++i) {
        int j0=i, j1=i, k=0;
        double sum = 0.;

        // Use a moving window to hash every L.size() continuous words with length nW in S.
        // This while loop will terminate within nS/nW iterations since the increasement of j1 is nW,
        // So the complexity of this while loop is O(nS / nW).
        while(j1<=nS-nW) {
            sum += log(values[j1]);
            ++k;
            j1 += nW;
            if (k==nL) {
                int key1 = (sum-floor(sum)) * unit;
                if (key1==key) result.push_back(j0);
                sum -= log(values[j0]);
                --k;
                j0 += nW;
            }
        }
    }
    return result;
}
};