题目:
Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
Note:
- The same word in the dictionary may be reused multiple times in the segmentation.
- You may assume the dictionary does not contain duplicate words.
Example 1:
Input: s = "leetcode", wordDict = ["leet", "code"] Output: true Explanation: Return true because"leetcode"
can be segmented as"leet code"
.
Example 2:
Input: s = "applepenapple", wordDict = ["apple", "pen"] Output: true Explanation: Return true because"applepenapple"
can be segmented as"apple pen apple"
. Note that you are allowed to reuse a dictionary word.
Example 3:
Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false
又是一道令我非常头大的题。
第一种做法是最普通的简单粗暴的方法,采用递归进行操作(感觉对于处理子字符串的问题,递归都是很经典的做法)。比如我们先check一个字符串的前面一部分存在在dict里,那么我们就需要判断后面一部分是否也能够分割。因此,我们需要写一个helper function,函数的参数里要加上当前判断的位置。由于题目给出的dict是vector,我们需要把它存成hashset以提高查找的速度。在递归函数中,如果位置已经到了字符串的末尾,则匹配成功;否则,遍历从这个位置开始的substring,如果substring存在于字典中则递归调用这个函数,判断后一部分是否能够break。但是这样做的时间复杂度比较高,因为递归会存在大量重复的计算。于是我们可以采用memo数组,存放某个index是否能够分割的结果,我们将其初始化为-1,如果能分割就赋值为1,不能就赋值为0,这样就可以在memo存在的情况下直接返回,而不需要重复计算了。
Runtime: 20 ms, faster than 30.84% of C++ online submissions for Word Break.
Memory Usage: 15.7 MB, less than 26.41% of C++ online submissions for Word Break.
class Solution {
public:
bool helper(string s, unordered_set<string>& dict, int index, vector<int>& memo) {
if (index == s.size()) {
return true;
}
if (memo[index] != -1) {
return memo[index];
}
for (int i = index; i < s.size(); i++) {
if (dict.count(s.substr(index, i - index + 1)) && helper(s, dict, i + 1, memo)) {
memo[index] = 1;
return true;
}
}
memo[index] = 0;
return false;
}
bool wordBreak(string s, vector<string>& wordDict) {
unordered_set<string> dict(wordDict.begin(), wordDict.end());
vector<int> memo(s.size(), -1);
return helper(s, dict, 0, memo);
}
};
2020.10.6 Java版dp解法
我们通过dp[i]表示s[0, i)这个substring可以被broken,所以整个问题就是dp[len]是否可以被broken,即s[0, len)是否能被broken。因此我们声明一个长度为len + 1的数组,来存放dp[0] - dp[len]的结果。对于每个dp[i],我们可以把它拆分成两个部分,假设是j,那dp[i] == true的条件就是dp[j] == true (s[0, j)可以被broken)&& s[j, i)在wordlist里。主要就是注意处理边界条件。然后就可以快乐写代码了。
Runtime: 6 ms, faster than 65.16% of Java online submissions for Word Break.
Memory Usage: 39.2 MB, less than 72.98% of Java online submissions for Word Break.
class Solution {
public boolean wordBreak(String s, List<String> wordDict) {
Set<String> wordSet = new HashSet<>(wordDict);
boolean[] dp = new boolean[s.length() + 1];
dp[0] = true;
for (int i = 0; i < dp.length; i++) {
for (int j = 0; j < i; j++) {
// dp[j]: [0, j) can be broken
// s.substring(j ,i): [j, i) is in set
if (dp[j] && wordSet.contains(s.substring(j, i))) {
dp[i] = true;
break;
}
}
}
return dp[dp.length - 1];
}
}
以下是曾经的cpp笔记:
看了下dp解法,dp[i]表示s[0, 1, ..., i - 1]是否可以拆分。对于dp[i]来说,判断它是否可以拆分,我们还需要在中间插入一个循环,用来拆分这个子字符串是否能够在不同的位置被拆分。需要注意的几点就是,首先是dp数组的大小要是s.size() + 1,因为要将空字符串考虑在内,所有的子字符串dp好像都是这个套路。for循环遍历时也要遍历到dp的size而不是s的size。在内层循环中,取substring的时候,substring的长度是i - j,不需要再+1,这里还没完全想透。另外也可以在if里面true了就直接break掉,不需要重复计算了。下面是break前后的时空消耗:
Runtime: 16 ms, faster than 47.49% of C++ online submissions for Word Break.
Memory Usage: 14.1 MB, less than 52.83% of C++ online submissions for Word Break.
Runtime: 8 ms, faster than 76.77% of C++ online submissions for Word Break.
Memory Usage: 14.4 MB, less than 43.40% of C++ online submissions for Word Break.
class Solution {
public:
bool wordBreak(string s, vector<string>& wordDict) {
unordered_set<string> dict(wordDict.begin(), wordDict.end());
vector<int> dp(s.size() + 1, 0); // should be size + 1 to consider empty string
dp[0] = true; // make empty string to be true
for (int i = 0; i < dp.size(); i++) { // use dp.size(), not s.size()
for (int j = 0; j < i; j++) {
if (dp[j] && dict.count(s.substr(j, i - j))) { // substr len is i - j, no + 1
dp[i] = true;
break;
}
}
}
return dp.back();
}
};