LeetCode - 139. Word Break
Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
Note:
The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.
Example 1:
Input: s = “leetcode”, wordDict = [“leet”, “code”] Output: true
Example 2:
Input: s = “applepenapple”, wordDict = [“apple”, “pen”] Output: true
Explanation: Return true because “applepenapple” can be segmented as “apple pen apple”.
Note that you are allowed to reuse a dictionary word.
Example 3:
Input: s = “catsandog”, wordDict = [“cats”, “dog”, “sand”, “and”, “cat”] Output: false
就是用一个 vector 中的单词拼接出给定字符串 s,每个单词可以重复用无限次。
开设一个 n + 1 大小的 dp 数组 dp[i] 表示字符串 s 前 i 个字符能否被拼接出来,可见 dp[0] = true,其他初始设为 false。对于每一个位置 dp[i],如果对于 0 到 i 中的一个位置 j 有 dp[i - j] 为 true 且 s.substr(i - j, j) 存在于字典中,说明可以用 dp[i - j] 的拼接方式再拼接上 s.substr(i - j, j) 来拼接出 dp[i],所以 dp[i] = true。
在对于每一个 dp[i] 计算时,不需要计算 0 到 i 中所有位置,因为比如字典中最小的单词长度为 4,那么 i,i - 1,i - 2,i - 3 这几个位置就不需要计算了。(beats 100%)
bool wordBreak(string s, vector<string>& wordDict) {
unordered_set<string> dic;
const int n = s.length();
size_t max_len = 0, min_len = INT_MAX;
vector<bool> dp(n + 1, false);
dp[0] = true;
for(const string& w : wordDict) {
dic.insert(w);
max_len = max(max_len, w.length());
min_len = min(min_len, w.length());
}
for(size_t i = min_len; i <= n; ++i)
for(size_t j = min_len; j <= i && j <= max_len; ++j)
if(dp[i - j] && dic.find(s.substr(i - j, j)) != dic.end()) {
dp[i] = true;
break;
}
return dp[n];
}
}
LeetCode - 140. Word Break II
Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences.
The same word in the dictionary may be reused multiple times in the segmentation.
Example 1:
Input: s = “catsanddog” wordDict = [“cat”, “cats”, “and”, “sand”, “dog”]
Output: [ “cats and dog”, “cat sand dog” ]
Example 2:
Input: s = “pineapplepenapple” wordDict = [“apple”, “pen”, “applepen”, “pine”, “pineapple”]
Output: [ “pine apple pen apple”, “pineapple pen apple”, “pine applepen apple” ]
Example 3:
Input: s = “catsandog” wordDict = [“cats”, “dog”, “sand”, “and”, “cat”]
Output: []
这道题就是刚才上边的题的返回值从是不是可以拼接,变成返回所有拼接可能情况。用的是 DFS + DP,代码如下 (beats 100%):
size_t max_len = 0, min_len = INT_MAX;
unordered_set<string> dict;
void dfs(const string& s, vector<string>& retain, int idx, vector<string>& res) {
if (idx == s.size()) {
string tmp;
for (const string& str : retain)
tmp += str + " ";
tmp.pop_back();
res.push_back(tmp);
return;
}
for (size_t len = min_len; len <= max_len && idx + len <= s.size(); len++)
if (dict.find(s.substr(idx, len)) != dict.end()) {
retain.push_back(s.substr(idx, len));
dfs(s, retain, idx + len, res);
retain.pop_back();
}
}
vector<string> wordBreak(string s, vector<string>& wordDict) {
const size_t N = s.length();
vector<bool> dp(N + 1, false);
dp[0] = true;
for (const auto& w : wordDict) {
dict.insert(w);
max_len = max(max_len, w.length());
min_len = min(min_len, w.length());
}
for (size_t i = min_len; i <= N; i++)
for (size_t j = min_len; j <= i && j <= max_len; ++j) // j: 查询单词长度, i >= j 不能写成 i - j >= 0 !!!
if (dp[i - j] && dict.find(s.substr(i - j, j)) != dict.end()) {
dp[i] = true;
break;
}
vector<string> res, retain;
if(dp[N])
dfs(s, retain, 0, res);
return res;
}
值得注意的就是,size_t 不能用 i - j >= 0 判断,因为 size_t 永远是 >= 0 的数。
LeetCode - 472. Concatenated Words
Given a list of words (without duplicates), please write a program that returns all concatenated words in the given list of words. A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.
Example:
Input: [“cat”,“cats”,“catsdogcats”,“dog”,“dogcatsdog”,“hippopotamuses”,“rat”,“ratcatdogcat”]
Output: [“catsdogcats”,“dogcatsdog”,“ratcatdogcat”]
就是在一个字符串数组 words 中找出所有能够用 words 中两个或以上的单词拼接出来的单词。
使用 DFS 进行递归判断一个字符串是否可以被其他字符串拼接而成,因为一定可以被自己组成,所以设置 cur_words 记录被几个字符串拼接而成,> 1 才能返回 true。网上有做法是先把这个字符串自身 erase 出去,然后判断完再 insert 进来,没有这样做的原因有两个:
- 删除和插入还是需要时间的,多加一次自身的判断并不会增加很多时间
- 有了 cur_words 这个值,题目要求至少被几个字符串拼接而成,都可以做了
unordered_set<string> dict;
size_t max_len = 0, min_len = INT_MAX, start;
bool dfs_check(const string& word, int cur_words) {
if(word.empty() && cur_words > 1) return true;
for(int i = start; i <= min(word.size(), max_len); ++i)
if(dict.find(word.substr(0, i)) != dict.end()
&& dfs_check(word.substr(i), cur_words + 1))
return true;
return false;
}
vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
for(const string& w : words){
dict.insert(w);
max_len = max(max_len, w.length());
min_len = min(min_len, w.length());
}
start = max(min_len, size_t(1)); // 因为有 "" 存在
vector<string> res;
for(const string& w : words)
if(dfs_check(w, 0))
res.push_back(w);
return res;
}