LeetCode 17.13 恢复空格
题目
哦,不!你不小心把一个长篇文章中的空格、标点都删掉了,并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!“已经变成了"iresetthecomputeritstilldidntboot”。在处理标点符号和大小写之前,你得先把它断成词语。当然了,你有一本厚厚的词典dictionary,不过,有些词没在词典里。假设文章用sentence表示,设计一个算法,把文章断开,要求未识别的字符最少,返回未识别的字符数。
注意:本题相对原题稍作改动,只需返回未识别的字符数。
Oh, no! You have accidentally removed all spaces, punctuation, and capitalization in a lengthy document. A sentence like “I reset the computer. It still didn’t boot!” became "iresetthecomputeritstilldidntboot’’. You’ll deal with the punctuation and capitalization later; right now you need to re-insert the spaces. Most of the words are in a dictionary but a few are not. Given a dictionary (a list of strings) and the document (a string), design an algorithm to unconcatenate the document in a way that minimizes the number of unrecognized characters. Return the number of unrecognized characters.
Note: This problem is slightly different from the original one in the book.
Input:
dictionary = [“looked”,“just”,“like”,“her”,“brother”]
sentence = “jesslookedjustliketimherbrother”
Output: 7
Explanation: After unconcatenating, we got “jess looked just like tim her brother”, which containing 7 unrecognized characters.
解法一
思路
这道题看到是求最少的未匹配字符数,首先就是想到动态规划,先是一个暴力求解。
创建一个数组dp[]用来记录结果。sentence从前往后看,其中dp[0]=0为前面默认没有未识别的字符,dp[i]表示句子前i个字符中最少的未识别字符数。
然后得到状态转移方程。对于前i个字符,即句子字符串的[0,i),它可能是由最前面的[0,j)子字符串加上一个字典匹配的单词得到,也就是dp[i]=dp[j], j<i;也可能没找到字典中的单词,可以用它前i-1个字符的结果加上一个没有匹配到的第i个字符,即dp[i]=dp[i-1]+1。要注意的是,即使前面存在匹配的单词,也不能保证哪一种剩下的字符最少,所以每轮都要比较一次最小值。所以,在字典中找得到单词的时候,状态转移方程为:
d
p
[
i
]
=
m
i
n
(
d
p
[
i
]
,
d
p
[
j
−
1
]
)
dp[i]=min(dp[i],dp[j-1])
dp[i]=min(dp[i],dp[j−1])
未找到的时候为:
d
p
[
i
]
=
d
p
[
i
−
1
]
+
1
dp[i] = dp[i - 1] + 1
dp[i]=dp[i−1]+1
代码
class Solution {
public int respace(String[] dictionary, String sentence) {
Set<String> dic = new HashSet<>();
for (String s : dictionary) {
dic.add(s);
}
int n = sentence.length();
int[] dp = new int[n + 1];
dp[0] = 0;
for (int i = 1; i <= n; i++) {
dp[i] = dp[i - 1] + 1;//先假设前i个都未匹配
for (int j = 0; j < i; j++) {
if(dic.contains(sentence.substring(j,i))){
dp[i] = Math.min(dp[j],dp[i]);//在识别当前字符与不识别当前字符的两种情况中取较小值
}
}
}
return dp[n];
}
}
解法二
思路
由于暴力解法中,存在很多字典中根本不会存在的单词也进行比较,所以想到前缀树,用Trie就可以省去很多没有意义的比较。这里是倒序将字典中的单词插入到Trie中,在查询比较的时候,也是倒序遍历sentence中的字符的。如图(借用LeetCode题解的动图帮助自己理解,看动图真的好容易理解,我也要学着画动图!!)
代码
class Solution {
public class TrieNode{
public TrieNode[] next;
public boolean isEnd;
public TrieNode(){
next = new TrieNode[26];
isEnd = false;
}
public void insert(String s){
TrieNode curPos = this;
for (int i = s.length() - 1; i >= 0; --i) {
int t = s.charAt(i) - 'a';
if(curPos.next[t] == null){
curPos.next[t] = new TrieNode();
}
curPos = curPos.next[t];
}
curPos.isEnd = true;
}
}
public int respace(String[] dictionary, String sentence) {
int n = sentence.length();
TrieNode root = new TrieNode();
for (String word : dictionary) {
root.insert(word);
}
int[] dp = new int[n + 1];
dp[0] = 0;
for (int i = 1; i <= n; i++) {
dp[i] = dp[i - 1] + 1;
TrieNode curPos = root;
for (int j = i; j >= 1; --j) {
int t = sentence.charAt(j - 1) - 'a';
if(curPos.next[t] == null){
break;
} else if(curPos.next[t].isEnd){
dp[i] = Math.min(dp[i],dp[j - 1]);
}
if(dp[i] == 0){
break;
}
curPos = curPos.next[t];
}
}
return dp[n];
}
}
复杂度分析
时间复杂度:
O
(
∣
d
i
c
t
i
o
n
a
r
y
∣
+
n
2
)
O(|dictionary| + n^2)
O(∣dictionary∣+n2),其中 ∣dictionary∣ 代表词典中的总字符数,
n
=
s
e
n
t
e
n
c
e
.
l
e
n
g
t
h
n=sentence.length
n=sentence.length。建字典树的时间复杂度取决于单词的总字符数,即
∣
d
i
c
t
i
o
n
a
r
y
∣
∣dictionary∣
∣dictionary∣,因此时间复杂度为
O
(
∣
d
i
c
t
i
o
n
a
r
y
∣
)
O(∣dictionary∣)
O(∣dictionary∣)。dp 数组一共有 n+1 个状态,每个状态转移的时候最坏需要
O
(
n
)
O(n)
O(n)的时间复杂度,因此时间复杂度为
O
(
n
2
)
O(n^2)
O(n2)。
空间复杂度:
O
(
∣
d
i
c
t
i
o
n
a
r
y
∣
∗
S
+
n
)
O(∣dictionary∣∗ S+n)
O(∣dictionary∣∗S+n),其中 S 代表字符集大小,这里为小写字母数,因此 S=26。我们可以这样考虑空间复杂度的渐进上界:对于字典而言,如果节点个数为
∣
n
o
d
e
∣
∣node∣
∣node∣,字符集大小为 S,那么空间代价为
O
(
∣
n
o
d
e
∣
∗
S
)
O(∣node∣∗S)
O(∣node∣∗S);因为这里的节点数一定小于词典中的总字符数,故
O
(
∣
n
o
d
e
∣
∗
S
)
=
O
(
∣
d
i
c
t
i
o
n
a
r
y
∣
∗
S
)
O(|node|*S) = O(|dictionary|*S)
O(∣node∣∗S)=O(∣dictionary∣∗S)。dp 数组的空间代价为
O
(
n
)
O(n)
O(n)。