第十章 Trie字典树
10-1 什么是Trie字典树
10-2 Trie字典树基础
10-3 Trie字典树的查询
10-4 Trie字典树的前缀查询
10-5 Trie字典树和简单的模式匹配
10-6 Trie字典树和字符串映射
10-7 更多和Trie字典树相关的话题
10-1 什么是Trie字典树
Trie字典树(前缀树):之前介绍的二分搜索树,堆,线段树都是二叉树,而字典树是多叉树。例如通讯录,字典。
Trie字典树将整个字符串以字母为单位一个一个拆开,从根节点开始一直到叶子节点去遍历,每遍历到一个叶子节点就形成了一个单词。比如下图中,每个节点有26个指向下个节点的指针(英文字母26个):
但考虑不同的语言(中文西班牙语etc),不同的情境(考虑大小写,网址,邮件地址等等),因此:
- 每个节点有若干指向下一个节点的指针
class Node {
boolean isWord;
Map < char, Node > next;
}
10-2 Trie字典树基础
import java.util.TreeMap;
public class Trie {
private class Node{
public boolean isWord;
public TreeMap<Character, Node> next;
public Node(boolean isWord){
this.isWord = isWord;
next = new TreeMap<>();
}
public Node(){
this(false);
}
}
private Node root;
private int size;
public Trie(){
root = new Node();
size = 0;
}
// 获得Trie中存储的单词数量
public int getSize(){
return size;
}
// 向Trie中添加一个新的单词word
public void add(String word){
Node cur = root;
for(int i = 0 ; i < word.length() ; i ++){
char c = word.charAt(i);
if(cur.next.get(c) == null)
cur.next.put(c, new Node());
cur = cur.next.get(c);
}
if(!cur.isWord){
cur.isWord = true;
size ++;
}
}
}
10-3 Trie字典树的查询
对于Trie来说,它只能存储字符串这样的元素相应的集合,而基于二分搜索树实现的集合可以存储任意元素。Trie添加和查询字符串的长度有关,与有多少个字符串无关。
// 查询单词word是否在Trie中
public boolean contains(String word){
Node cur = root;
for(int i = 0 ; i < word.length() ; i ++){
char c = word.charAt(i);
if(cur.next.get(c) == null)
return false;
cur = cur.next.get(c);
}
return cur.isWord;
}
}
本节视频中有将Trie和BST进行比较,有需要的话可以再看一下视频。合理规划时间。
10-4 Trie字典树的前缀查询
// 查询是否在Trie中有单词以prefix为前缀
public boolean isPrefix(String prefix){
Node cur = root;
for(int i = 0 ; i < prefix.length() ; i ++){
char c = prefix.charAt(i);
if(cur.next.get(c) == null)
return false;
cur = cur.next.get(c);
}
return true;
}
}
接下来有一个leetcode相关题目:
Leetcode-208 实现Trie(前缀树)
代码实现如下:
import java.util.TreeMap;
public class Trie208 {
private class Node{
public boolean isWord;
public TreeMap<Character, Node> next;
public Node(boolean isWord){
this.isWord = isWord;
next = new TreeMap<>();
}
public Node(){
this(false);
}
}
private Node root;
public Trie208(){
root = new Node();
}
// 向Trie中添加一个新的单词word
public void insert(String word){
Node cur = root;
for(int i = 0 ; i < word.length() ; i ++){
char c = word.charAt(i);
if(cur.next.get(c) == null)
cur.next.put(c, new Node());
cur = cur.next.get(c);
}
cur.isWord = true;
}
// 查询单词word是否在Trie中
public boolean search(String word){
Node cur = root;
for(int i = 0 ; i < word.length() ; i ++){
char c = word.charAt(i);
if(cur.next.get(c) == null)
return false;
cur = cur.next.get(c);
}
return cur.isWord;
}
// 查询是否在Trie中有单词以prefix为前缀
public boolean startsWith(String isPrefix){
Node cur = root;
for(int i = 0 ; i < isPrefix.length() ; i ++){
char c = isPrefix.charAt(i);
if(cur.next.get(c) == null)
return false;
cur = cur.next.get(c);
}
return true;
}
}
10-5 Trie字典树和简单的模式匹配
Leetcode-211 添加与搜索单词 - 数据结构设计
- 具体的编程实践:
Node:Trie节点的定义
addWord:从头到尾扫描一下这个word的所有字符,对于每个字符,来查找一下当前的节点它所对应的next中是否存在这个字符对应的下一个节点的映射,如果不存在就创建一个。之后就来到这个字符所对应的下一个节点的映射,以此类推。
search(重点):在最坏的情况下,这个word里全都是点“ . ”,此时我们就需要遍历整个Trie多叉树,因此我们写一个私有的递归函数 match.
import java.util.TreeMap;
public class WordDictionary {
private class Node{
public boolean isWord;
public TreeMap<Character, Node> next;
public Node(boolean isWord){
this.isWord = isWord;
next = new TreeMap<>();
}
public Node(){
this(false);
}
}
private Node root;
/** Initialize your data structure here. */
public WordDictionary() {
root = new Node();
}
/** Adds a word into the data structure. */
public void addWord(String word) {
Node cur = root;
for(int i = 0 ; i < word.length() ; i ++){
char c = word.charAt(i);
if(cur.next.get(c) == null)
cur.next.put(c, new Node());
cur = cur.next.get(c);
}
cur.isWord = true;
}
/** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
public boolean search(String word) {
return match(root, word, 0);
}
private boolean match(Node node, String word, int index){
if(index == word.length())
return node.isWord;
char c = word.charAt(index);
if(c != '.'){
if(node.next.get(c) == null)
return false;
return match(node.next.get(c), word, index + 1);
}
else{
for(char nextChar: node.next.keySet())
if(match(node.next.get(nextChar), word, index + 1))
return true;
return false;
}
}
}
10-6 Trie字典树和字符串映射
Leetcode-677 键值映射
import java.util.TreeMap;
public class MapSum {
private class Node{
public int value;
public TreeMap<Character, Node> next;
public Node(int value){
this.value = value;
next = new TreeMap<>();
}
public Node(){
this(0);
}
}
private Node root;
/** Initialize your data structure here. */
public MapSum() {
root = new Node();
}
public void insert(String key, int val) {
Node cur = root;
for(int i = 0 ; i < key.length() ; i ++){
char c = key.charAt(i);
if(cur.next.get(c) == null)
cur.next.put(c, new Node());
cur = cur.next.get(c);
}
cur.value = val;
}
public int sum(String prefix) {
Node cur = root;
for(int i = 0 ; i < prefix.length() ; i ++){
char c = prefix.charAt(i);
if(cur.next.get(c) == null)
return 0;
cur = cur.next.get(c);
}
return sum(cur);
}
private int sum(Node node){
int res = node.value;
for(char c: node.next.keySet())
res += sum(node.next.get(c));
return res;
}
}
10-7 更多和Trie字典树相关的话题
实不相瞒这一章没有非常仔细的做笔记,这次是第一遍记笔记,接下来到复习的时候,会重新学习这里。
Trie删除操作
Trie的局限性:最大的问题是空间!
压缩字典树 compressed Trie
Ternary search trie
字符串模式识别,后缀树,子串查询(KMP, Boyer-Moore, Rabin-Karp),文件压缩,模式匹配,编译原理,DNA