字典树又名前缀树(Prefix Tree)
根节点不包含字符,除根节点外每一个节点都只包含一个字符; 从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串; 每个节点的所有子节点包含的字符都不相同。
Trie(前缀树) 的模板:.
新建一个 TrieNode 的 class 用于表示 Trie 中的节点,包含 children 和 is_word 两个属性
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for c in word:
if c not in node.children:
node.children[c] = TrieNode()
node = node.children[c]
node.is_word = True
def find(self, word):
node = self.root
for c in word:
node = node.children.get(c)
if node is None:
return None
return node
def search(self, word):
node = self.find(word)
return node is not None and node.is_word
def startsWith(self, prefix):
return self.find(prefix) is not None
前缀树的应用:
1、单词搜索
给定一个二维网格和一个单词,找出该单词是否存在于网格中。
单词必须按照字母顺序,通过相邻的单元格内的字母构成,其中“相邻”单元格是那些水平相邻或垂直相邻的单元格。同一个单元格内的字母不允许被重复使用。
示例: board =[ ['A','B','C','E'], ['S','F','C','S'], ['A','D','E','E']]
给定 word = "ABCCED", 返回 true
给定 word = "SEE", 返回 true
给定 word = "ABCB", 返回 false
来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/word-search/
def exist(board, word):
points = [(0, -1), (0, 1), (-1, 0), (1, 0)]
def search(x, y, path):
if path not in prefix_set:
return False
if path == word:
return TrueA
for delta_x, delta_y in points:
new_x, new_y = x + delta_x, y + delta_y
if not (0 <= new_x < len(board) and 0 <= new_y < len(board[0])):
continue
if (new_x, new_y) not in visited:
visited.add((new_x, new_y))
if search(new_x, new_y, path + board[new_x][new_y]):
return True
visited.remove((new_x, new_y))
if board is None or len(board) == 0:
return []
prefix_set = set()
for i in range(len(word)):
prefix_set.add(word[:i + 1])
for i in range(len(board)):
for j in range(len(board[0])):
visited = set([(i, j)])
if search(i, j, board[i][j]):
return True
return False
2、单词搜索 II
给定一个二维网格 board 和一个字典中的单词列表 words,找出所有同时在二维网格和字典中出现的单词。
单词必须按照字母顺序,通过相邻的单元格内的字母构成,其中“相邻”单元格是那些水平相邻或垂直相邻的单元格。同一个单元格内的字母在一个单词中不允许被重复使用。
示例:
输入: words = ["oath","pea","eat","rain"] and board =[ ['o','a','a','n'], ['e','t','a','e'], ['i','h','k','r'], ['i','f','l','v']]
输出: ["eat","oath"]
来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/word-search-ii/
def findWords(board: List[List[str]], words: List[str]) -> List[str]:
points = [(0, -1), (0, 1), (-1, 0), (1, 0)]
@functools.lru_cache(None)
def search(x, y, path):
if path not in prefix_set:
return
if path in word_set:
result.add(path)
for delta_x, delta_y in points:
new_x, new_y = x + delta_x, y + delta_y
if not (0 <= new_x < len(board) and 0 <= new_y < len(board[0])):
continue
if (new_x, new_y) not in visited:
visited.add((new_x, new_y))
search(new_x, new_y, path + board[new_x][new_y])
visited.remove((new_x, new_y))
if board is None or len(board) == 0:
return []
word_set = set(words)
prefix_set = set()
for word in words:
for i in range(len(word)):
prefix_set.add(word[:i + 1])
result = set()
for i in range(len(board)):
for j in range(len(board[0])):
visited = set([(i, j)])
search(i, j, board[i][j])
return list(result)
神奇的Trie实现方法
因为defaultdict注册的默认构造函数只有第一次调用的时候才会真正地调用,所以这里可以用自己来定义自己
https://www.iambigboss.top/post/59424_1_1.html
- Trie是一个函数,调用它会返回一个defaultdict
- dict.__getitem__需要两个参数,第一个是字典对象,第二个是key
- 最开始,字典对象是tri,key是word的第一个字符c,trie[c]会返回一个新字典
- 新字典代表着该节点以字符c为开头的子节点
- 以此类推,最后得到一个叶子结点,它是默认应该是空的
Trie = lambda: collections.defaultdict(Trie)
trie = Trie() # 这是字典树的根
for word in words:
reduce(dict.__getitem__, root, trie)
1、单词替换
在英语中,我们有一个叫做 词根(root)的概念,它可以跟着其他一些词组成另一个较长的单词——我们称这个词为 继承词(successor)。例如,词根an,跟随着单词 other(其他),可以形成新的单词 another(另一个)。现在,给定一个由许多词根组成的词典和一个句子。你需要将句子中的所有继承词用词根替换掉。如果继承词有许多可以形成它的词根,则用最短的词根替换它。
输入: dict(词典) = ["cat", "bat", "rat"]
sentence(句子) = "the cattle was rattled by the battery"
输出: "the cat was rat by the bat"
来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/replace-words/
def replaceWords(roots: List[str], sentence: str) -> str:
from functools import reduce
Trie = lambda: collections.defaultdict(Trie)
tri = Trie()
END = True
for root in roots:
reduce(dict.__getitem__, root, tri)[END] = root
def replace(word):
cur = tri
for c in word:
# 要么这个字符不在当前节点的后继节点里,要么已经遇到了最短的前缀
if c not in cur or END in cur:
return cur.get(END, word)
cur = cur[c]
return word
sentence = sentence.split(" ")
res = [replace(word) for word in sentence]
return " ".join(res)
如果是后缀树,则将roo[::-1]
2、单词的压缩编码
给定一个单词列表,我们将这个列表编码成一个索引字符串 S 与一个索引列表 A。
输入: words = ["time", "me", "bell"]
输出: 10
说明: S = "time#bell#" , indexes = [0, 2, 5] 。
来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/short-encoding-of-words/
def minimumLengthEncoding(words):
words = list(set(words))
import collections
from functools import reduce
Trie = lambda: collections.defaultdict(Trie)
trie = Trie()
# 这里保存着每个word对应的最后一个节点,比如对于单词time,它保存字母t对应的节点(因为是从后往前找的)
nodes = [reduce(dict.__getitem__, word[::-1], trie) for word in words]
# 没有children,意味着这个节点是个叶子,nodes保存着每个word对应的最后一个节点,当它是一个叶子时,我们就该累加这个word的长度+1,这就是为什么我们在最开始要去重
return sum(len(word) + 1 for i, word in enumerate(words) if len(nodes[i]) == 0)