Trie：字典树（AKA单词查找树）的定义，性质，Java实现和算法分析

最新推荐文章于 2024-08-03 23:21:40 发布

akihiro_the_coder

最新推荐文章于 2024-08-03 23:21:40 发布

阅读量284

点赞数

分类专栏：算法字典树文章标签：算法数据结构 java

本文链接：https://blog.csdn.net/weixin_44465604/article/details/104796189

版权

算法同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

字典树

1 篇文章 0 订阅

订阅专栏

在由字母组成的字典树中，一个单词就是一条路径。

Definition定义：

Trie: a tree of characters, and we make it out of a dictionary.
字典树就是一颗由字符组成的树，我们可以将它理解为一个字典，字典树也允许我们用被查找的String中的一个char来开始进行查找。这个数据结构的作用就是取出数据，基本功能包括二叉树的那些，譬如查找，插入。

Side notes: 看了一个介绍Trie的视频，第一下就想起了很久之前做过的LeetCode问题Word Ladder（“Given two words (beginWord and endWord), and a dictionary’s word list, find the length of shortest transformation sequence from beginWord to endWord”一时忽然想起，查看solution才发现鲜有Trie实现的解法，大多数使用BFS即可解决）.

Trie基本性质：

属于查找树的一种，有node和edge，可能有为空的节点。
每个节点一个父亲，每个节点也只可以指向另外一个结点（叶除外，因为叶指向空）。
对于一个英文字典trie来说，理论上说每个结点都有26条edge,因为英语有26个字母。有时候儿子node可以为空，在绘画的时候就不画出来了。
每条路径就是一条字母的组合，我觉得这种存单词的方法节省空间，因为避免很多重复，而且查找效率高，不需要遍历整个单词库。
值为null的节点在trie中是没有对应的string的，它存在的理由就是为了简化查找操作（Mark：有待在search函数中被证实）。

查找：

在一个Trie中寻找一个string就是以string里面的字符作为基准。因为每个node都会包含所有下一个可能出现的字符的链接，查找一个string在不在树中，就是从根节点开始，一层层往下找。
查找的三种情况：

String的最后一个字符在树中被找到了，说明整个string都是存在于这棵树之中的，return节点中所保留的值。
String最后一个字符在树中对应的值是null，说明这个string不存在树中。
String都没有走到最后一个字符就遇到空链接而终止了查找，也说明string不在树中。

插入：

要在一个字典树里面插入，必须先搜索，这点跟二叉树是一致的。
回顾二叉树：BST的插入(数据结构课的作业code）

	/*** MUTATORS ***/
	/**
	 * Inserts a new node in the tree
	 * 
	 * @param data the data to insert
	 */
	public void insert(T data) {
		if (root == null) {
            root = new Node(data);
        } else {
            insert(data, root);
        }
	}

	/**
	 * Helper method to insert Inserts a new value in the tree
	 * 
	 * @param data the data to insert
	 * @param node the current node in the search for the correct location in which
	 *             to insert
	 */
	private void insert(T data, Node node) {
		if(data.compareTo(node.data) < 0 || data.equals(node.data)) {
			if(node.left == null) {
				node.left = new Node (data);
			}else {
				insert(data, node.left);
			}
		}else {
			if(node.right == null) {
				node.right = new Node (data);
			}else {
				insert(data, node.right);
			}
		}
	}

根据二叉搜索树的插入method，显而易见，要插入，首先就是找在哪里开始下手。

字典树的插入：

新加入string：在Trie中插入，就好像在一本大字典中新造一个词，必定是找第一个“分岔点”，在分岔点新建node(s)，从而在整棵树中添加一条新词汇。
简而言之就是从根结点开始梳理，一直努力找和目的string重合的路径，直到找不到了，就新开辟一条路，把剩下的字母们放在里面。
要加入的string已经在字典树里面出现过了，“we set that node’s value to the value to be associated with the key”。这里涉及到Trie的node representation。

Node representation:
在字典树中，每个node都有R个链接。在英语单词树中，R=26（as we discussed above），那么我们可以把树中每个node理解成一个数组。字母和单词都是implicitly存在字典树中的，也就是说，node不是存整个string的，node只存字母，string是以树中的路径形式存在的。
《算法》第四版中对字典树node这些性质的visualization

Java实现：

首先，像创建二叉树的node一样，创建一个字典树的node的class。
Q：为什么要用HashMap？
A：因为HashMap具有不接受duplicate的性质，而我们的字典树的每个node所指向的node也是唯一的（比如字母A有且只有一个对应的key）。另外，HashMap每个key只能存一个value，完全符合我们的要求。或者也可以用Array来实现。

/*
 * TrieNode class with basic methods
 */

import java.util.HashMap;

public class TrieNode {
	private char c;
	private HashMap <Character, TrieNode> children = new HashMap<>();
	private boolean isLeaf;
	
	public TrieNode() {
	}
	
	public TrieNode(char c) {
		this.c = c;
	}
	
	public HashMap <Character, TrieNode> getChildren(){
		return children;
	}
	
	public void setChildren(HashMap<Character, TrieNode> children) {
		this.children = children;
	}
	
	public boolean isLeaf(){
		return isLeaf;
	}
	
	public void setLeaf(boolean isLeaf) {
		this.isLeaf = isLeaf;
	}
}

然后就是Trie class的具体实现：

/*
 * Trie class
 */
import java.util.*;

public class Trie {
	private TrieNode root;
	
	public Trie() {
		root = new TrieNode();
	}
	/*
	 * The insert method adds words into the 
	 *  Trie character by character. 
	 */
	public void insert(String word) {
		// 首先找到根结点所对应的儿子的hashmap
		HashMap<Character, TrieNode> children = root.getChildren();
		// 从单词的第一个字母开始，看看第一个字母在哪里
		for(int i = 0; i < word.length(); i++) {
			char c = word.charAt(i);
			TrieNode node;
			// 找到了第一个字母
			if(children.containsKey(c)){
				node = children.get(c);
			} else {
				//第一个字母并不在所找的行的hashmap中，那么我们需要新建node，并把
				//这个字母放入树中
				node = new TrieNode (c);
				children.put(c, node);
			}
			//Children切换到下一行
			children = node.getChildren();
			//如果整个单词都遍历过了，最后一个字母的node就会是一个叶子
			if(i == word.length() - 1) {
				node.setLeaf(true);
			}
		}
	}
	
	public boolean search(String word) {
		//首先关注第一行
		HashMap<Character, TrieNode> children = root.getChildren();
		TrieNode node = null;
		//遍历整个string，一个字母一个字母地搜索，找到了就往下挪一行，找不到就跳出循环
		for(int i = 0; i < word.length(); i++) {
			char c = word.charAt(i);
			if(children.containsKey(c)) {
				node = children.get(c);
				children = node.getChildren();
			} else {
				// node = null像一个flag，表明存在没找到的情况
				node = null;
				break;
			}
		}
		//只有在找到了并且这条路径没有继续往下的路的时候，我们才能说这个word是在字典里
		if(node != null && node.isLeaf()) {
			return true;
		} else {
			return false;
		}
	}
}

字典树的重要性质：

字典树的形状与其插入／删除节点的顺序是没有关系的。在二叉树中，不同顺序的插入会有不同样子的树，但是字典树不会。

算法时间复杂度

建立一个字典树：O(W*L) W = 单词数量，L=每个单词的平均长度。因为我们要一个一个地插入单词，而且单词是一个字母一个字母的进来。

Ref：《算法》第四版

akihiro_the_coder

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Trie：字典树（AKA单词查找树）的定义，性质，Java实现和算法分析

Ref：算法第四版Definition定义：Trie: a tree of characters, and we make it out of a dictionary.字典树就是一颗由字符组成的树，我们可以将它理解为一个字典，字典树也允许我们用被查找的String中的一个char来开始进行查找。这个数据结构的作用就是取出数据，基本功能包括二叉树的那些，譬如查找，插入。Side notes...
复制链接

扫一扫

专栏目录