哦,不!你不小心把一个长篇文章中的空格、标点都删掉了,并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!“已经变成了"iresetthecomputeritstilldidntboot”。在处理标点符号和大小写之前,你得先把它断成词语。当然了,你有一本厚厚的词典dictionary,不过,有些词没在词典里。假设文章用sentence表示,设计一个算法,把文章断开,要求未识别的字符最少,返回未识别的字符数。
注意:本题相对原题稍作改动,只需返回未识别的字符数
示例:
输入:
dictionary = [“looked”,“just”,“like”,“her”,“brother”]
sentence = “jesslookedjustliketimherbrother”
输出: 7
解释: 断句后为"jess looked just like tim her brother",共7个未识别字符。
前缀树+动态规划,前缀树真的好用,复杂度可以降很多!!!
定义dp[i] 表示考虑前 i个字符最少的未识别的字符数量,从前往后计算dp 值。
package OfferKiller;/*
@author Oblak
@date 2022/7/6
@description
*/
import java.util.*;
public class Main17_13 {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt();
String[] dictionary = new String[n];
for (int i = 0; i < n; i++) dictionary[i] = sc.next();
String sentence = sc.next();
System.out.println(respace(dictionary, sentence));
}
public static int respace(String[] dictionary, String sentence) {
Set<String> set = new HashSet<>();
Trie_Tree tr = new Trie_Tree();
for (String str : dictionary) {
set.add(str);
tr.insert(str);
}
int[] dp = new int[sentence.length() + 1];//表示考虑前i个字符最少的未识别的字符数量
Arrays.fill(dp, Integer.MAX_VALUE);
dp[0] = 0;
for (int i = 1; i <= sentence.length(); i++) {
dp[i] = dp[i - 1] + 1;
Trie_Tree cur = tr;
for (int j = i; j >= 1; j--) {
//如果不匹配 说明没办法减少
if (cur.children[sentence.charAt(j - 1) - 'a'] == null) break;
//如果[j,i]的子字符串匹配成功 那么最少的字符就等于dp[j-1]
else if (cur.children[sentence.charAt(j - 1) - 'a'].isEnd) {
dp[i] = Math.min(dp[i], dp[j - 1]);
}
cur = cur.children[sentence.charAt(j - 1) - 'a'];
}
}
return dp[sentence.length()];
}
}
class Trie_Tree {
Trie_Tree[] children;
boolean isEnd;
String value;
public Trie_Tree() {
children = new Trie_Tree[26];
}
public void insert(String word) {
Trie_Tree cur = this;
//看清楚这次是从尾部反序插入字典树的
for (int i = word.length() - 1; i >= 0; i--) {
if (cur.children[word.charAt(i) - 'a'] == null) {
cur.children[word.charAt(i) - 'a'] = new Trie_Tree();
}
cur = cur.children[word.charAt(i) - 'a'];
}
cur.isEnd = true;
cur.value = word;
}
}