哦,不!你不小心把一个长篇文章中的空格、标点都删掉了,并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!"已经变成了"iresetthecomputeritstilldidntboot"。在处理标点符号和大小写之前,你得先把它断成词语。当然了,你有一本厚厚的词典dictionary,不过,有些词没在词典里。假设文章用sentence表示,设计一个算法,把文章断开,要求未识别的字符最少,返回未识别的字符数。
注意:本题相对原题稍作改动,只需返回未识别的字符数
示例:
输入:
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
输出: 7
解释: 断句后为"jess looked just like tim her brother",共7个未识别字符。
哈希表
dp[n]表示在以A[n]结尾时未识别的字符数
dp[i]初始化为dp[i-1]+1
然后遍历字典,看以A[i]结尾的子串是否在字典里,如果在,那么可能要删除这个单词,所以dp[i]=min(dp[i-wl],dp[i])
class Solution {
public:
int respace(vector<string>& dictionary, string sentence) {
if(sentence.length() == 0)
return 0;
int n = sentence.length();
if(dictionary.size() == 0)
return n;
int* dp = new int[n+1];
memset(dp, 0, (n+1)*sizeof(int));
for(int i = 1;i <= n; i++)
{
dp[i] = dp[i-1] + 1;
for(auto word : dictionary)
{
int wordlen = word.length();
if(i - wordlen >= 0 && sentence.substr(i - wordlen, wordlen) == word)
dp[i] = min(dp[i - wordlen], dp[i]);
}
}
return dp[n];
}
};