【DP】【DFS】LeetCode - Word Break I - II、Concatenated Words

最新推荐文章于 2022-01-09 13:29:47 发布

Bob__yuan

最新推荐文章于 2022-01-09 13:29:47 发布

阅读量158

点赞数 1

分类专栏： LeetCode Algorithm # 编程题文章标签： DP DFS LeetCode

本文链接：https://blog.csdn.net/Bob__yuan/article/details/100154576

版权

LeetCode Algorithm 同时被 2 个专栏收录

71 篇文章 1 订阅

订阅专栏

编程题

58 篇文章 1 订阅

订阅专栏

LeetCode - 139. Word Break

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:
The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.

Example 1:
Input: s = “leetcode”, wordDict = [“leet”, “code”] Output: true

Example 2:
Input: s = “applepenapple”, wordDict = [“apple”, “pen”] Output: true
Explanation: Return true because “applepenapple” can be segmented as “apple pen apple”.
Note that you are allowed to reuse a dictionary word.

Example 3:
Input: s = “catsandog”, wordDict = [“cats”, “dog”, “sand”, “and”, “cat”] Output: false

就是用一个 vector 中的单词拼接出给定字符串 s，每个单词可以重复用无限次。
开设一个 n + 1 大小的 dp 数组 dp[i] 表示字符串 s 前 i 个字符能否被拼接出来，可见 dp[0] = true，其他初始设为 false。对于每一个位置 dp[i]，如果对于 0 到 i 中的一个位置 j 有 dp[i - j] 为 true 且 s.substr(i - j, j) 存在于字典中，说明可以用 dp[i - j] 的拼接方式再拼接上 s.substr(i - j, j) 来拼接出 dp[i]，所以 dp[i] = true。
在对于每一个 dp[i] 计算时，不需要计算 0 到 i 中所有位置，因为比如字典中最小的单词长度为 4，那么 i，i - 1，i - 2，i - 3 这几个位置就不需要计算了。（beats 100%）

bool wordBreak(string s, vector<string>& wordDict) {
    unordered_set<string> dic;
    const int n = s.length();
    size_t max_len = 0, min_len = INT_MAX;
    vector<bool> dp(n + 1, false);
    dp[0] = true;
    for(const string& w : wordDict) {
    	dic.insert(w);
        max_len = max(max_len, w.length());
        min_len = min(min_len, w.length());
    }
    for(size_t i = min_len; i <= n; ++i)
        for(size_t j = min_len; j <= i && j <= max_len; ++j)
            if(dp[i - j] && dic.find(s.substr(i - j, j)) != dic.end()) {
                dp[i] = true;
                break;
            }
    return dp[n];
}
}

LeetCode - 140. Word Break II

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences.
The same word in the dictionary may be reused multiple times in the segmentation.

Example 1:
Input: s = “catsanddog” wordDict = [“cat”, “cats”, “and”, “sand”, “dog”]
Output: [ “cats and dog”, “cat sand dog” ]

Example 2:
Input: s = “pineapplepenapple” wordDict = [“apple”, “pen”, “applepen”, “pine”, “pineapple”]
Output: [ “pine apple pen apple”, “pineapple pen apple”, “pine applepen apple” ]

Example 3:
Input: s = “catsandog” wordDict = [“cats”, “dog”, “sand”, “and”, “cat”]
Output: []

这道题就是刚才上边的题的返回值从是不是可以拼接，变成返回所有拼接可能情况。用的是 DFS + DP，代码如下 （beats 100%）：

size_t max_len = 0, min_len = INT_MAX;
unordered_set<string> dict;
    
void dfs(const string& s, vector<string>& retain, int idx, vector<string>& res) {
	if (idx == s.size()) {
		string tmp;
		for (const string& str : retain)
			tmp += str + " ";
		tmp.pop_back();
		res.push_back(tmp);
		return;
	}

	for (size_t len = min_len; len <= max_len && idx + len <= s.size(); len++)
		if (dict.find(s.substr(idx, len)) != dict.end()) {
			retain.push_back(s.substr(idx, len));
			dfs(s, retain, idx + len, res);
			retain.pop_back();
		}
}

vector<string> wordBreak(string s, vector<string>& wordDict) {
	const size_t N = s.length();
    vector<bool> dp(N + 1, false);
	dp[0] = true;
	for (const auto& w : wordDict) {
        dict.insert(w);
		max_len = max(max_len, w.length());
        min_len = min(min_len, w.length());
	}
	for (size_t i = min_len; i <= N; i++)
		for (size_t j = min_len; j <= i && j <= max_len; ++j)  // j: 查询单词长度, i >= j 不能写成 i - j >= 0 !!!
			if (dp[i - j] && dict.find(s.substr(i - j, j)) != dict.end()) {
				dp[i] = true;
				break;
			}

    vector<string> res, retain;
	if(dp[N])
		dfs(s, retain, 0, res);
	return res;
}

值得注意的就是，size_t 不能用 i - j >= 0 判断，因为 size_t 永远是 >= 0 的数。

LeetCode - 472. Concatenated Words

Given a list of words (without duplicates), please write a program that returns all concatenated words in the given list of words. A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

Example:
Input: [“cat”,“cats”,“catsdogcats”,“dog”,“dogcatsdog”,“hippopotamuses”,“rat”,“ratcatdogcat”]
Output: [“catsdogcats”,“dogcatsdog”,“ratcatdogcat”]

就是在一个字符串数组 words 中找出所有能够用 words 中两个或以上的单词拼接出来的单词。
使用 DFS 进行递归判断一个字符串是否可以被其他字符串拼接而成，因为一定可以被自己组成，所以设置 cur_words 记录被几个字符串拼接而成，> 1 才能返回 true。网上有做法是先把这个字符串自身 erase 出去，然后判断完再 insert 进来，没有这样做的原因有两个：

删除和插入还是需要时间的，多加一次自身的判断并不会增加很多时间
有了 cur_words 这个值，题目要求至少被几个字符串拼接而成，都可以做了

unordered_set<string> dict;
size_t max_len = 0, min_len = INT_MAX, start;

bool dfs_check(const string& word, int cur_words) {
    if(word.empty() && cur_words > 1) return true;
    for(int i = start; i <= min(word.size(), max_len); ++i)
        if(dict.find(word.substr(0, i)) != dict.end()
          && dfs_check(word.substr(i), cur_words + 1)) 
            return true;
    return false;
}

vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
    for(const string& w : words){
        dict.insert(w);
        max_len = max(max_len, w.length());
        min_len = min(min_len, w.length());
    }
    start = max(min_len, size_t(1));    // 因为有 "" 存在
    vector<string> res;
    for(const string& w : words) 
        if(dfs_check(w, 0))
            res.push_back(w);
    return res;
}