【NO.62】LeetCode HOT 100—208. 实现 Trie (前缀树)

最新推荐文章于 2024-07-20 22:34:59 发布

悬浮海

最新推荐文章于 2024-07-20 22:34:59 发布

阅读量893

点赞数 17

分类专栏： # LeetCode HOT 100 文章标签： leetcode 208. 实现 Trie前缀树

本文链接：https://blog.csdn.net/wang_luwei/article/details/136602423

版权

LeetCode HOT 100 专栏收录该内容

101 篇文章 1 订阅

订阅专栏

文章目录

208. 实现 Trie (前缀树)

Trie（发音类似 “try”）或者说前缀树是一种树形数据结构，用于高效地存储和检索字符串数据集中的键。这一数据结构有相当多的应用情景，例如自动补完和拼写检查。

请你实现 Trie 类：

Trie() 初始化前缀树对象。
void insert(String word) 向前缀树中插入字符串 word 。
boolean search(String word) 如果字符串 word 在前缀树中，返回 true（即，在检索之前已经插入）；否则，返回 false 。
boolean startsWith(String prefix) 如果之前已经插入的字符串 word 的前缀之一为 prefix ，返回 true ；否则，返回 false 。

示例：

输入
[“Trie”, “insert”, “search”, “search”, “startsWith”, “insert”, “search”]
[[], [“apple”], [“apple”], [“app”], [“app”], [“app”], [“app”]]
输出
[null, null, true, false, true, null, true]

解释
Trie trie = new Trie();
trie.insert(“apple”);
trie.search(“apple”); // 返回 True
trie.search(“app”); // 返回 False
trie.startsWith(“app”); // 返回 True
trie.insert(“app”);
trie.search(“app”); // 返回 True

提示：

1 <= word.length, prefix.length <= 2000
word 和 prefix 仅由小写英文字母组成
insert、search 和 startsWith 调用次数总计不超过 3 * 104 次

解题

//时间复杂度：初始化为 O(1)其余操作为 O(n)，其中 n是每次插入或查询的字符串的长度。
// 空间复杂度：O(∣T∣⋅Σ)，其中 ∣T∣ 为所有插入字符串的长度之和，Σ 为字符集的大小，本题 Σ=26
class Trie {

    //Trie，又称前缀树或字典树，是一棵有根树，其每个节点包含以下字段：
    // 指向子节点的指针数组 children。对于本题而言，数组长度为 26，即小写英文字母的数量。此时 children[0]对应小写字母 a，children[25] 对应小写字母 z。
    // 布尔字段 isEnd，表示该节点是否为字符串的结尾。

    Trie[] children;
    boolean isEnd;

    public Trie() {
        children = new Trie[26];
        isEnd = false;
    }
    
    public void insert(String word) {
        // this 指向当前对象
        Trie node = this;
        // 遍历字符串
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            int index = c - 'a';
            if (node.children[index] == null) {
                // 如果为null，先构建
                node.children[index] = new Trie();
            }
            node = node.children[index];
        }
        // 遍历完之后，将布尔字段 isEnd 置为true；
        node.isEnd = true;

        // 这里有点哈希表的感觉，哈希表为数组+红黑树（链表），这里是数组+数组（一直往下+），例如字符：app，
        // 第一层 a对应的数组下标为0，在这个位置就new一个Trie对象，它就不空，不过此时这个对象的isEnd为false；
        // 第二层，在第一层Trie对象的数组里，p字符对应的位置，就不空，又来一个Trie对象
        // 第三层，在第二层Trie对象的数组里，p字符对应的位置，就不空，又来一个Trie对象，此时字符遍历结束，这个位置Trie对象的isEnd为true
    }
    
    public boolean search(String word) {
        // 这是寻找整个字符是否在前缀树中，一直往下寻找，中间有为null的，直接返回null；
        // 找到最后，把最后一层的Trie对象返回，不空且isEnd属性为true；说明整个字符插入过
        Trie node = searchPrefix(word);
        return node != null && node.isEnd;
    }
    
    public boolean startsWith(String prefix) {
        // 前缀开头，不用判断属性isEnd，只用判断每层的Trie对象是否存在
        return searchPrefix(prefix) != null;
    }

    private Trie searchPrefix(String prefix) {
        Trie node = this;
        // 遍历字符串
        for (int i = 0; i < prefix.length(); i++) {
            char c = prefix.charAt(i);
            int index = c - 'a';
            if (node.children[index] == null) {
                return null;
            }
            node = node.children[index];
        }

        return node;
        
    }
}

/**
 * Your Trie object will be instantiated and called as such:
 * Trie obj = new Trie();
 * obj.insert(word);
 * boolean param_2 = obj.search(word);
 * boolean param_3 = obj.startsWith(prefix);
 */

前缀树的详细解释

Trie 是一颗非典型的多叉树模型，多叉好理解，即每个结点的分支数量可能为多个。

为什么说非典型呢？因为它和一般的多叉树不一样，尤其在结点的数据结构设计上，比如一般的多叉树的结点是这样的：

struct TreeNode {
    VALUETYPE value;    //结点值
    TreeNode[] children;    //指向孩子结点
};

而 Trie 的结点是这样的(假设只包含’a’~'z’中的字符)：

struct TrieNode {
    boolean isEnd; //该结点是否是一个串的结束
    TrieNode[] next; //字母映射表
};

这时字母映射表next 的妙用就体现了，TrieNode next[26]中保存了对当前结点而言下一个可能出现的所有字符的链接，因此我们可以通过一个父结点来预知它所有子结点的值。

我们来看个例子吧。

想象以下，包含三个单词 “sea”,“sells”,“she” 的 Trie 会长啥样呢？

它的真实情况是这样的：
在这里插入图片描述

Trie 中一般都含有大量的空链接，因此在绘制一棵单词查找树时一般会忽略空链接，同时为了方便理解我们可以画成这样：
在这里插入图片描述

Trie 的一些常用操作方法

接下来我们一起来实现对 Trie 的一些常用操作方法。

定义类 Trie

class Trie {
    boolean isEnd;
    Trie[] next;
	
	public Trie() {
        children = new Trie[26];
        isEnd = false;
    }
};

插入
描述：向 Trie 中插入一个单词 word

实现：这个操作和构建链表很像。首先从根结点的子结点开始与 word 第一个字符进行匹配，一直匹配到前缀链上没有对应的字符，这时开始不断开辟新的结点，直到插入完 word 的最后一个字符，同时还要将最后一个结点isEnd = true;，表示它是一个单词的末尾。

    public void insert(String word) {
        // this 指向当前对象
        Trie node = this;
        // 遍历字符串
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            int index = c - 'a';
            if (node.children[index] == null) {
                // 如果为null，先构建
                node.children[index] = new Trie();
            }
            // 切换到下一级
            node = node.children[index];
        }
        // 遍历完之后，将布尔字段 isEnd 置为true；
        node.isEnd = true;

        // 这里有点哈希表的感觉，哈希表为数组+红黑树（链表），这里是数组+数组（一直往下+），例如字符：app，
        // 第一层 a对应的数组下标为0，在这个位置就new一个Trie对象，它就不空，不过此时这个对象的isEnd为false；
        // 第二层，在第一层Trie对象的数组里，p字符对应的位置，就不空，又来一个Trie对象
        // 第三层，在第二层Trie对象的数组里，p字符对应的位置，就不空，又来一个Trie对象，此时字符遍历结束，这个位置Trie对象的isEnd为true
    }

查找
描述：查找 Trie 中是否存在单词 word

实现：从根结点的子结点开始，一直向下匹配即可，如果出现结点值为空就返回 false，如果匹配到了最后一个字符，那我们只需判断 node->isEnd即可。

public boolean search(String word) {
        // 这是寻找整个字符是否在前缀树中，一直往下寻找，中间有为null的，直接返回null；
        // 找到最后，把最后一层的Trie对象返回，不空且isEnd属性为true；说明整个字符插入过
        Trie node = searchPrefix(word);
        return node != null && node.isEnd;
}
private Trie searchPrefix(String prefix) {
        Trie node = this;
        // 遍历字符串
        for (int i = 0; i < prefix.length(); i++) {
            char c = prefix.charAt(i);
            int index = c - 'a';
            if (node.children[index] == null) {
                return null;
            }
            // 切换到下一级查找
            node = node.children[index];
        }
        return node;  
 }

前缀匹配
描述：判断 Trie 中是或有以 prefix 为前缀的单词

实现：和 search 操作类似，只是不需要判断最后一个字符结点的isEnd，因为既然能匹配到最后一个字符，那后面一定有单词是以它为前缀的。

 public boolean startsWith(String prefix) {
     // 前缀开头，不用判断属性isEnd，只用判断每层的Trie对象是否存在
     return searchPrefix(prefix) != null;
 }
private Trie searchPrefix(String prefix) {
        Trie node = this;
        // 遍历字符串
        for (int i = 0; i < prefix.length(); i++) {
            char c = prefix.charAt(i);
            int index = c - 'a';
            if (node.children[index] == null) {
                return null;
            }
            // 切换到下一级查找
            node = node.children[index];
        }
        return node;  
 }

到这我们就已经实现了对 Trie 的一些基本操作，这样我们对 Trie 就有了进一步的理解。完整代码我贴在了文末。

总结
通过以上介绍和代码实现我们可以总结出 Trie 的几点性质：

Trie 的形状和单词的插入或删除顺序无关，也就是说对于任意给定的一组单词，Trie 的形状都是唯一的。
查找或插入一个长度为 L 的单词，访问 next 数组的次数最多为 L+1，和 Trie 中包含多少个单词无关。
Trie 的每个结点中都保留着一个字母表，这是很耗费空间的。如果 Trie 的高度为 n，字母表的大小为 m，最坏的情况是 Trie 中还不存在前缀相同的单词，那空间复杂度就为 O(mn)

最后，关于 Trie 的应用场景，希望你能记住 8 个字：一次建树，多次查询。