字典树Trie

最新推荐文章于 2022-06-11 21:39:58 发布

Rnan-prince

最新推荐文章于 2022-06-11 21:39:58 发布

阅读量882

点赞数

分类专栏：算法数据结构题库

本文链接：https://blog.csdn.net/qq_19446965/article/details/105061591

版权

算法同时被 3 个专栏收录

51 篇文章 7 订阅

订阅专栏

数据结构

36 篇文章 2 订阅

订阅专栏

题库

30 篇文章 0 订阅

订阅专栏

字典树又名前缀树（Prefix Tree）

根节点不包含字符，除根节点外每一个节点都只包含一个字符；从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串；每个节点的所有子节点包含的字符都不相同。

Trie(前缀树) 的模板：.

新建一个 TrieNode 的 class 用于表示 Trie 中的节点，包含 children 和 is_word 两个属性

class TrieNode:
    
    def __init__(self):
        self.children = {}
        self.is_word = False
    
    
class Trie:
    
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for c in word:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]
        
        node.is_word = True

    def find(self, word):
        node = self.root
        for c in word:
            node = node.children.get(c)
            if node is None:
                return None
        return node
        
    def search(self, word):
        node = self.find(word)
        return node is not None and node.is_word

    def startsWith(self, prefix):
        return self.find(prefix) is not None

前缀树的应用：

1、单词搜索

给定一个二维网格和一个单词，找出该单词是否存在于网格中。

单词必须按照字母顺序，通过相邻的单元格内的字母构成，其中“相邻”单元格是那些水平相邻或垂直相邻的单元格。同一个单元格内的字母不允许被重复使用。

示例: board =[ ['A','B','C','E'], ['S','F','C','S'], ['A','D','E','E']]

给定 word = "ABCCED", 返回 true
给定 word = "SEE", 返回 true
给定 word = "ABCB", 返回 false

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/word-search/

def exist(board, word):
    points = [(0, -1), (0, 1), (-1, 0), (1, 0)]

    def search(x, y, path):
        if path not in prefix_set:
            return False

        if path == word:
            return TrueA

        for delta_x, delta_y in points:
            new_x, new_y = x + delta_x, y + delta_y
            if not (0 <= new_x < len(board) and 0 <= new_y < len(board[0])):
                continue
            if (new_x, new_y) not in visited:
                visited.add((new_x, new_y))
                if search(new_x, new_y, path + board[new_x][new_y]):
                    return True
                visited.remove((new_x, new_y))

    if board is None or len(board) == 0:
        return []

    prefix_set = set()
    for i in range(len(word)):
        prefix_set.add(word[:i + 1])

    for i in range(len(board)):
        for j in range(len(board[0])):
            visited = set([(i, j)])
            if search(i, j, board[i][j]):
                return True
    return False

2、单词搜索 II

给定一个二维网格 board 和一个字典中的单词列表 words，找出所有同时在二维网格和字典中出现的单词。

单词必须按照字母顺序，通过相邻的单元格内的字母构成，其中“相邻”单元格是那些水平相邻或垂直相邻的单元格。同一个单元格内的字母在一个单词中不允许被重复使用。

示例:

输入: words = ["oath","pea","eat","rain"] and board =[ ['o','a','a','n'], ['e','t','a','e'], ['i','h','k','r'], ['i','f','l','v']]

输出: ["eat","oath"]

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/word-search-ii/

def findWords(board: List[List[str]], words: List[str]) -> List[str]:
    points = [(0, -1), (0, 1), (-1, 0), (1, 0)]

    @functools.lru_cache(None)
    def search(x, y, path):
        if path not in prefix_set:
            return

        if path in word_set:
            result.add(path)

        for delta_x, delta_y in points:
            new_x, new_y = x + delta_x, y + delta_y
            if not (0 <= new_x < len(board) and 0 <= new_y < len(board[0])):
                continue
            if (new_x, new_y) not in visited:
                visited.add((new_x, new_y))
                search(new_x, new_y, path + board[new_x][new_y])
                visited.remove((new_x, new_y))

    if board is None or len(board) == 0:
        return []

    word_set = set(words)
    prefix_set = set()
    for word in words:
        for i in range(len(word)):
            prefix_set.add(word[:i + 1])

    result = set()
    for i in range(len(board)):
        for j in range(len(board[0])):
            visited = set([(i, j)])
            search(i, j, board[i][j])
    return list(result)

神奇的Trie实现方法

因为defaultdict注册的默认构造函数只有第一次调用的时候才会真正地调用，所以这里可以用自己来定义自己

https://www.iambigboss.top/post/59424_1_1.html

Trie是一个函数，调用它会返回一个defaultdict
dict.__getitem__需要两个参数，第一个是字典对象，第二个是key
最开始，字典对象是tri，key是word的第一个字符c，trie[c]会返回一个新字典
新字典代表着该节点以字符c为开头的子节点
以此类推，最后得到一个叶子结点，它是默认应该是空的

Trie = lambda: collections.defaultdict(Trie)
trie = Trie() # 这是字典树的根
for word in words:
    reduce(dict.__getitem__, root, trie)

1、单词替换

在英语中，我们有一个叫做词根(root)的概念，它可以跟着其他一些词组成另一个较长的单词——我们称这个词为继承词(successor)。例如，词根an，跟随着单词 other(其他)，可以形成新的单词 another(另一个)。现在，给定一个由许多词根组成的词典和一个句子。你需要将句子中的所有继承词用词根替换掉。如果继承词有许多可以形成它的词根，则用最短的词根替换它。

输入: dict(词典) = ["cat", "bat", "rat"]
sentence(句子) = "the cattle was rattled by the battery"
输出: "the cat was rat by the bat"

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/replace-words/

def replaceWords(roots: List[str], sentence: str) -> str:
    from functools import reduce
    Trie = lambda: collections.defaultdict(Trie)
    tri = Trie()
    END = True

    for root in roots:
        reduce(dict.__getitem__, root, tri)[END] = root

    def replace(word):
        cur = tri
        for c in word:
            # 要么这个字符不在当前节点的后继节点里，要么已经遇到了最短的前缀
            if c not in cur or END in cur: 
                return cur.get(END, word)
            cur = cur[c]
        return word
    
    sentence = sentence.split(" ")
    res = [replace(word) for word in sentence]
    return " ".join(res)

如果是后缀树，则将roo[::-1]

2、单词的压缩编码

给定一个单词列表，我们将这个列表编码成一个索引字符串 S 与一个索引列表 A。

输入: words = ["time", "me", "bell"]
输出: 10
说明: S = "time#bell#" ， indexes = [0, 2, 5] 。

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/short-encoding-of-words/

def minimumLengthEncoding(words):
    words = list(set(words))
    import collections
    from functools import reduce
    Trie = lambda: collections.defaultdict(Trie)
    trie = Trie()
    # 这里保存着每个word对应的最后一个节点，比如对于单词time，它保存字母t对应的节点（因为是从后往前找的）
    nodes = [reduce(dict.__getitem__, word[::-1], trie) for word in words]
    # 没有children，意味着这个节点是个叶子，nodes保存着每个word对应的最后一个节点，当它是一个叶子时，我们就该累加这个word的长度+1，这就是为什么我们在最开始要去重
    return sum(len(word) + 1 for i, word in enumerate(words) if len(nodes[i]) == 0)

Rnan-prince

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
字典树Trie

字典树又名前缀树（Prefix Tree）根节点不包含字符，除根节点外每一个节点都只包含一个字符；从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串；每个节点的所有子节点包含的字符都不相同。Trie(前缀树) 的模板：.新建一个 TrieNode 的 class 用于表示 Trie 中的节点，包含 children 和 is_word 两个属性class ...
复制链接

扫一扫

专栏目录