题目
You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.For example, given:
s: “barfoothefoobarman”
words: [“foo”, “bar”]You should return the indices: [0,9].
(order does not matter).Subscribe to see which companies asked this question.
解法1
注:n = s.length(), m = words.size(), len = words.size() == 0 ? 0 : words[0].length()
首先将words中的所有单词,加入hashmap中,这样对于s中的每个位置i,花O(len)取得子串s[i, i + len]后,就可以花O(1)来确定是否在words中。
然后找到区间[l, r),满足[l, l + len), [l + len, l + 2*len)……子串均出现在words中,而且r - l = m * len,那么l是答案中的一个。
此算法最低复杂度为O(n*len)
class Solution {
std::string substr(std::string& s, int beginIndex, int len) {
if (beginIndex < 0 || beginIndex + len > s.length())
return "";
return s.substr(beginIndex, len);
}
public:
vector<int> findSubstring(string s, vector<string>& words) {
int len = words.size() == 0 ? 0 : words[0].length();
std::vector<int> ret;
unordered_map<string, int> mp, tmp;
for (int i = 0; i < words.size(); i++)
mp[words[i]]++;
for (int i = 0; i < len; i++) {
tmp.clear();
int l = i, r = i;
while (r + len <= s.length()) {
while (r + len <= s.length() && tmp[substr(s, r, len)] + 1 <= mp[substr(s, r, len)]) {
tmp[substr(s, r, len)]++;
r += len;
}
if (r - l == words.size() * len && (ret.size() == 0 || ret[ret.size() - 1] != l))
ret.push_back(l);
if (r + len <= s.length()) {
tmp[substr(s, r, len)]++;
r += len;
}
while (l + len <= r && tmp[substr(s, l, len)] <= mp[substr(s, l, len)]) {
tmp[substr(s, l, len)]--;
l += len;
}
if (l + len <= r) {
tmp[substr(s, l, len)]--;
l += len;
}
if (r - l == words.size() * len)
ret.push_back(l);
}
}
return ret;
}
};
解法2
采用两种hash,记为h1和h2。h1: string -> int,h2: [int] -> int。
首先对于words中的每一个字符串,使用h1,再使用h2,得到words的最终hash值。
那么对于s[i, i + m * len],先使用f1,计算出s[i, i + len], s[i + len, i + 2*len]……,然后再使用f2得到最终hash值,与words的hash值进行比较,相同认为匹配,否则认为失败。
为了效率,应当选取可以递推求解的hash函数(即知道h2(h1(str))后,要求str删除左边第一个字符,在右边加上一个字符的新串str’时,复杂度为O(1))。注:转自LeetCode
class Solution { // The general idea: // Construct a hash function f for L, f: vector<string> -> int, // Then use the return value of f to check whether a substring is a concatenation // of all words in L. // f has two levels, the first level is a hash function f1 for every single word in L. // f1 : string -> double // So with f1, L is converted into a vector of float numbers // Then another hash function f2 is defined to convert a vector of doubles into a single int. // Finally f(L) := f2(f1(L)) // To obtain lower complexity, we require f1 and f2 can be computed through moving window. // The following corner case also needs to be considered: // f2(f1(["ab", "cd"])) != f2(f1(["ac", "bd"])) // There are many possible options for f2 and f1. // The following code only shows one possibility (probably not the best), // f2 is the function "hash" in the class, // f1([a1, a2, ... , an]) := int( decimal_part(log(a1) + log(a2) + ... + log(an)) * 1000000000 ) public: // The complexity of this function is O(nW). double hash(double f, double code[], string &word) { double result = 0.; for (auto &c : word) result = result * f + code[c]; return result; } vector<int> findSubstring(string S, vector<string> &L) { uniform_real_distribution<double> unif(0., 1.); default_random_engine seed; double code[128]; for (auto &d : code) d = unif(seed); double f = unif(seed) / 5. + 0.8; double value = 0; // The complexity of the following for loop is O(L.size( ) * nW). for (auto &str : L) value += log(hash(f, code, str)); int unit = 1e9; int key = (value-floor(value))*unit; int nS = S.size(), nL = L.size(), nW = L[0].size(); double fn = pow(f, nW-1.); vector<int> result; if (nS < nW) return result; vector<double> values(nS-nW+1); string word(S.begin(), S.begin()+nW); values[0] = hash(f, code, word); // Use a moving window to hash every word with length nW in S to a float number, // which is stored in vector values[] // The complexity of this step is O(nS). for (int i=1; i<=nS-nW; ++i) values[i] = (values[i-1] - code[S[i-1]]*fn)*f + code[S[i+nW-1]]; // This for loop will run nW times, each iteration has a complexity O(nS/nW) // So the overall complexity is O(nW * (nS / nW)) = O(nS) for (int i=0; i<nW; ++i) { int j0=i, j1=i, k=0; double sum = 0.; // Use a moving window to hash every L.size() continuous words with length nW in S. // This while loop will terminate within nS/nW iterations since the increasement of j1 is nW, // So the complexity of this while loop is O(nS / nW). while(j1<=nS-nW) { sum += log(values[j1]); ++k; j1 += nW; if (k==nL) { int key1 = (sum-floor(sum)) * unit; if (key1==key) result.push_back(j0); sum -= log(values[j0]); --k; j0 += nW; } } } return result; } };