leetcode-面试题 17.13. 恢复空格

最新推荐文章于 2022-05-03 23:07:39 发布

KpLn_HJL

最新推荐文章于 2022-05-03 23:07:39 发布

阅读量245

点赞数

分类专栏： OJ题目记录

本文链接：https://blog.csdn.net/sinat_41679123/article/details/107221137

版权

OJ题目记录专栏收录该内容

515 篇文章 1 订阅

订阅专栏

题目

哦，不！你不小心把一个长篇文章中的空格、标点都删掉了，并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!“已经变成了"iresetthecomputeritstilldidntboot”。在处理标点符号和大小写之前，你得先把它断成词语。当然了，你有一本厚厚的词典dictionary，不过，有些词没在词典里。假设文章用sentence表示，设计一个算法，把文章断开，要求未识别的字符最少，返回未识别的字符数。

注意：本题相对原题稍作改动，只需返回未识别的字符数

示例：

输入：
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
输出： 7
解释： 断句后为"jess looked just like tim her brother"，共7个未识别字符。

提示：

0 <= len(sentence) <= 1000
dictionary中总字符数不超过 150000。
你可以认为dictionary和sentence中只包含小写字母。

解题思路

没有思路。。。看了提示可以使用递归

递归版本：
按照dictionary中所有的字符，在sentence中找到字符出现的第一个下标，然后对去掉这个字符的sentence迭代
时间复杂度应该是 $o(n_{sentence} * n_{dictionary} = 10^8)$ ，超时了

看了题解的dp版：
用dp[i]表示第i位之前的sentence中未识别的字符，则：

如果当前第i位和之前的字符，能组成在字典中的子串，则dp[i] = dp[j]，其中j是能和当前第i位组成子串的前面的字符下标。需要考虑也许能组成多个在字典中的子串，这时就需要找最小的dp[j]
如果当前第i位的字符，和之前的任何子串都组不成在字典中的子串，则dp[i] = dp[i - 1] + 1

时间复杂度是 $o(n_{sentence}^2 = 10^6)$

dp + Trie树版：
观察dp中发现，大头的时间都在找当前i位之前的子串，是否存在于字典中了，这里可以用Trie树来加速

速度明显快了不少，dp版是8436 ms，加了trie树就变成3592 ms了

代码

递归版（未AC）：

class Solution:
    def respace(self, dictionary: List[str], sentence: str) -> int:
        value_index = {}
        for word in dictionary:
            if sentence.find(word) != -1:
                value_index[word] = sentence.find(word)
        if not value_index:
            return len(sentence)
        min_mising = len(sentence)
        for word, index in value_index.items():
            min_mising = min(min_mising, self.respace(dictionary, sentence[:index] + sentence[index + len(word):]))
        return min_mising

dp版：

class Solution:
    def respace(self, dictionary: List[str], sentence: str) -> int:
        dp = [0] + [index+1 for index in range(len(sentence))]
        for index in range(len(sentence)):
            for begin_index in range(index, -1, -1):
                if sentence[begin_index: index + 1] in dictionary:
                    dp[index + 1] = min(dp[index + 1], dp[begin_index])
            dp[index + 1] = min(dp[index + 1], dp[index] + 1)
        return dp[-1]

dp + trie树：

class TrieNode:
    def __init__(self):
        self.node_dict = {}
        self.end_flag = False

class TrieTree:
    def __init__(self, dictionary: dict):
        self.root = TrieNode()
        for word in dictionary:
            node = self.root
            for each_char in word:
                if each_char not in node.node_dict:
                    node.node_dict[each_char] = TrieNode()
                node = node.node_dict[each_char]
            node.end_flag = True

    def search(self, word: str) -> bool:
        node = self.root
        for each_char in word:
            if each_char not in node.node_dict:
                return False
            node = node.node_dict[each_char]
        return node.end_flag                

class Solution:
    def respace(self, dictionary: List[str], sentence: str) -> int:
        trie = TrieTree(dictionary)
        dp = [0] + [index+1 for index in range(len(sentence))]
        for index in range(len(sentence)):
            for begin_index in range(index, -1, -1):
                # if sentence[begin_index: index + 1] in dictionary:
                if trie.search(sentence[begin_index: index + 1]):                
                    dp[index + 1] = min(dp[index + 1], dp[begin_index])
            dp[index + 1] = min(dp[index + 1], dp[index] + 1)
        return dp[-1]