【Leetcode】字典树（Trie树）算法

小朱小朱绝不服输

已于 2022-06-01 15:16:27 修改

阅读量2.9k

点赞数 1

分类专栏：算法分析文章标签：算法 leetcode 数据结构 Trie树字典树

于 2022-05-16 21:20:59 首次发布

本文链接：https://blog.csdn.net/weixin_44052055/article/details/124807544

版权

算法分析专栏收录该内容

69 篇文章 74 订阅

订阅专栏

本文通过实例深入解析Trie树（字典树/前缀树）的构建方法，包括经典插入与搜索操作，以及在不同问题（如单词压缩编码、单词恢复、通配符匹配、魔法字典）中的应用。利用字典树的前缀特性，优化动态规划和暴力算法，提升解决方案的效率。

摘要由CSDN通过智能技术生成

在Trie树（字典树&前缀树）了解了数据结构构建的方法，这篇文章通过实例来对Trie树算法进行训练。

简单再回顾一下Trie树：
在这里插入图片描述来源：samarua，链接在下面

从这篇文章，可以学到：

经典Trie树的构建方法及相对应的insert，search方法模板。208. 实现 Trie (前缀树)
利用字典树的构造过程——忽略后缀单词，逆序insert字典树。820. 单词的压缩编码
利用字典树充分利用前缀(后缀)性质，优化暴力算法，dp + 字典树。面试题 17.13. 恢复空格
含有通配符的字典树匹配——递归的search。211. 添加与搜索单词 - 数据结构设计
允许且必须变化一个字符后再匹配——递归的search。676. 实现一个魔法字典

话不多说，进入正题。

208. 实现 Trie (前缀树)

1. 题目描述

leetcode链接：208. 实现 Trie (前缀树)
在这里插入图片描述

2. 思路分析

这道题是Trie构建的经典题目，只包含26个小写字母。首先是数据结构的定义，在Trie树（字典树&前缀树）中学习了两种构建方法，这里使用TrieNode节点的构建方法。

class TrieNode {
    boolean isWord;
    TrieNode[] children = new TrieNode[26];
}

两个关键点：

isWord为true的节点就是上面的图中红色的节点。举个例子，两个字符串"cat"和"catch"，字符t和字符h对应的节点，就是红色的(isWord = true)。
利用了一个长度为26的TrieNode[]数组，用下标表示字符(char - ‘a’)，用该下标对应的值表示指向子节点的引用。另外，如果没有a-z的限制，就不能用数组，而使用哈希表。

3. 参考代码

class Trie {
    class TrieNode {
        boolean isWord;
        TrieNode[] children = new TrieNode[26];
    }
    TrieNode root;

    public Trie() {
        root = new TrieNode();  // 构造字典树，空根节点
    }
    
    // 插入操作，按照word的字符，从根节点开始，一直向下走：
    // 如果遇到null，就new出新节点；如果节点已经存在，cur顺着往下走就可以
    public void insert(String word) {
        TrieNode cur = root;
        for (int i = 0; i < word.length(); i++) {
            int ch = word.charAt(i) - 'a';
            if (cur.children[ch] == null) {
                cur.children[ch] = new TrieNode();
            }
            cur = cur.children[ch];
        }
        cur.isWord = true;  // 一个单词插入完毕，此时cur指向的节点即为一个单词的结尾
    }

    // 查找操作，cur从根节点开始，按照word的字符一直尝试向下走：
    // 如果走到了null，说明这个word不是前缀树的任何一条路径，返回false;
    // 如果按照word顺利的走完，就要判断此时cur是否为单词尾端：如果是，返回true；如果不是，说明word仅仅是一个前缀，并不完整，返回false
    public boolean search(String word) {
        TrieNode cur = root;
        for (int i = 0; i < word.length(); i++) {
            int ch = word.charAt(i) - 'a';
            if (cur.children[ch] == null) {
                return false;
            }
            cur = cur.children[ch];
        }
        return cur.isWord;
    }
    
    // 判断前缀操作，和search方法一样，根据word从根节点开始一直尝试向下走：
    // 如果遇到null了，说明这个word不是前缀树的任何一条路径，返回false;
    // 如果安全走完了，直接返回true就行了———我们并不关心此时cur是不是末尾(isWord)
    public boolean startsWith(String prefix) {
        TrieNode cur = root;
        for (int i = 0; i < prefix.length(); i++) {
            int ch = prefix.charAt(i) - 'a';
            if (cur.children[ch] == null) {
                return false;
            }
            cur = cur.children[ch];
        }
        return true;
    }
}

注：search方法的变异——match递归

>>> 经典的search方法，是通过一个cur指针(引用)，根据word的字符，一条路走下去
>>> 其实，它还有一个思路———每次判断一个节点是否配对 的【递归】写法 ：

public boolean search(String word) {
	return match(word, root, 0);
}

/* macth方法
// 基本思路是：根据word和start得到此时的字符，然后看该字符是否与此时的节点node配对————即node.children[c]有值(!=null)
// (其实start就相当于非递归写法中的for(i)的i)，用来遍历word
*/
public boolean match(String word, TrieNode node, int start){		// 这个三个参数直接背下来，这是模板参数
    if(start == word.length()){								
        return node.isWord;					// (★) 
    }
    int c = word.charAt(start) - 'a';
    return node.children[c] != null && match(word, node.children[c], start + 1);
}

820. 单词的压缩编码

1. 题目描述

leetcode链接：820. 单词的压缩编码
在这里插入图片描述

2. 思路分析

利用字典树的构造过程——忽略后缀单词。

这道题想到采用字典树，是因为单词的压缩编码充分发挥了字典树的后缀特征。

构造出这样的一个[逆序]字典树，很容易发现： "编码"后的字符串长度，就是忽略了后缀单词后，所有单词的(长度+1)之和。

这不难理解，比如"abcd#","bcd","cd","d"这种后缀单词就默认被包括了，因而算整个字符串的长度时，算"abcd"这个最长的就行了。

核心思路是：

每次往字典树插入一个"新的word"时，就 += 该word的长度 + 1(#)
需要注意的是，不是每一次插入单词，都需要加上该单词的长度
而是先根据长度对words进行一次排序，先插入长的，再插入短的。如果插入时需要new出新节点，我们就认为这是一个"新word"

举几个例子：

先插"cba"，再插"dba" ———— 虽然后缀有重合，但是依旧需要new出新节点，认为是"新word",最终字符串只能为"cba#dba#"
先插"ba"，再插"dcba" ———— 两次插入都有new出新节点的行为，因此算多了，3+1 + 5+1 =8，实际为"dcba#"，为5
先插"dcba"，再插"ba" ———— 因为先插长的，第二次插入并没有出现new的行为，4+1 = 5，正确 ! ! !

所以，最基础的还是字典树节点构建以及插入操作。

3. 参考代码

class Solution {
    class TrieNode {
        boolean isWord;
        TrieNode[] children = new TrieNode[26];
    }
    class Trie {
        TrieNode root;
        public Trie() {
            root = new TrieNode();
        }
        // 单词逆序插入字典树；插入的同时，还会判断插入的单词是不是"新的"，如果是新单词，返回其length+1；否则返回0
        public int insert(String word) {
            TrieNode cur = root;
            boolean isNew = false;
            for (int i = word.length() - 1; i >= 0; i--) {  // 逆序插入字典树
                int ch = word.charAt(i) - 'a';
                if (cur.children[ch] == null) {
                    cur.children[ch] = new TrieNode();
                    isNew = true;
                }
                cur = cur.children[ch];
            }
            cur.isWord = true;
            return isNew ? word.length() + 1 : 0;
        }
    }
    // 长度从大到小排序，对每一个单词执行插入操作
    public int minimumLengthEncoding(String[] words) {
        int res = 0;
        Arrays.sort(words, (s1, s2) -> (s2.length() - s1.length()));
        Trie trie = new Trie();
        for (String word : words) {
            res += trie.insert(word);
        }
        return res;
    }
}

面试题 17.13. 恢复空格

1. 题目描述

leetcode链接：面试题 17.13. 恢复空格
在这里插入图片描述

2. 思路分析

利用字典树充分利用前缀(后缀)性质，优化暴力算法

给定字符串，尽可能多地匹配字典内的单词，即最少未匹配数。

贪心不行，所以采用动态规划来解决。

dp[i] 表示字符串的前 i 个字符的最少未匹配数。

假设当前我们已经考虑完了前 i 个字符了，对于前 i + 1 个字符对应的最少未匹配数：

第 i + 1 个字符未匹配，则 dp[i + 1] = dp[i] + 1，即不匹配数加 1;
遍历前 i 个字符，若以其中某一个下标 j 为开头、以第 i + 1 个字符为结尾的字符串正好在词典里，则 dp[i] = min(dp[i], dp[j]) 更新 dp[i]。

方法一：暴力dp

时间复杂度是 O(n^2)，n 为待匹配字符串的长度。

方法二：dp + 字典树

对于上述解法，计算 dp[i + 1]时，我们需要用 j 来遍历前 i 个字符，逐个判断以 j 为开头，以第 i + 1 个字符为结尾的字符串是否在字典里。

这一步可以利用字典树来加速，通过字典树我们可以查询以第 i + 1 个字符为结尾的单词有哪些（构建字典树时将单词逆序插入即可）。

时间复杂度是 O(m+n^2)，m 是字典长度，n 为待匹配字符串的长度。为什么还是 n^2 呢？因为有可能状态转移的时候，每个位置都需要转移，这是最坏情况，绝大多数情况下远小于 n，所以解法二最终耗时会远小于解法一。

insert函数：单词word插入字典树(逆序) 【模板】
search函数：找到 sentence 中以 sentence[end] 为结尾的单词(可能不止一个)，返回这些单词的开头下标【★关键】

if(cur.children[c] == null){  // 从结尾处开始，一直尝试向前找，如果发现后缀已经不合法，直接终止
    break;                    // 这两行就是字典树对原算法的优化
}

3. 参考代码

方法一：暴力dp

class Solution {
    public int respace(String[] dictionary, String sentence) {
        Set<String> dict = new HashSet<>(Arrays.asList(dictionary));
        int n = sentence.length();
        int[] dp = new int[n + 1];
        for (int i = 1; i <= n; i++) {
            dp[i] = dp[i - 1] + 1;
            for (int j = 0; j < i; j++) {
                if (dict.contains(sentence.substring(j, i))) {
                    dp[i] = Math.min(dp[i], dp[j]);
                }
            }
        }
        return dp[n];       
    }
}

方法二：dp + 字典树

class Solution {
    class TrieNode {
        boolean isWord;
        TrieNode[] children = new TrieNode[26];
    }
    class Trie {
        TrieNode root;
        public Trie() {
            root = new TrieNode();
        }
        public void insert(String word) {
            TrieNode cur = root;
            for (int i = word.length() - 1; i >= 0; i--) {  // 逆序插入
                int ch = word.charAt(i) - 'a';
                if (cur.children[ch] == null) {
                    cur.children[ch] = new TrieNode();
                }
                cur = cur.children[ch];
            }
            cur.isWord = true;
        }
        // 找到 sentence 中以 sentence[end] 为结尾的单词(可能不止一个)，返回这些单词的开头下标 【★关键】
        public List<Integer> search(String sentence, int end) {
            List<Integer> list = new ArrayList<>();
            TrieNode cur = root;
            for (int i = end; i >= 0; i--) {
                int ch = sentence.charAt(i) - 'a';
                if (cur.children[ch] == null) { // 从结尾处开始，一直尝试向前找，如果发现后缀已经不合法，直接终止
                    break;
                }
                cur = cur.children[ch];
                if (cur.isWord) {
                    list.add(i);
                }
            }
            return list;
        }
    }
    public int respace(String[] dictionary, String sentence) {
        int n = sentence.length();
        int[] dp = new int[n + 1];
        Trie trie = new Trie();
        for (String word : dictionary) {
            trie.insert(word);
        }
        for (int i = 1; i <= n; i++) {
            dp[i] = dp[i - 1] + 1;
            for (int j : trie.search(sentence, i - 1)) {
                dp[i] = Math.min(dp[i], dp[j]);
            }
        }
        return dp[n];
    }
}

211. 添加与搜索单词 - 数据结构设计

1. 题目描述

leetcode链接：211. 添加与搜索单词 - 数据结构设计
在这里插入图片描述

2. 思路分析

含有通配符的字典树匹配——递归的search

在上面还学习了search的变异方法——match递归

public boolean search(String word) {
	return match(word, root, 0);
}

/* macth方法
// 基本思路是：根据word和start得到此时的字符，然后看该字符是否与此时的节点node配对————即node.children[c]有值(!=null)
// (其实start就相当于非递归写法中的for(i)的i)，用来遍历word
*/
public boolean match(String word, TrieNode node, int start){		// 这个三个参数直接背下来，这是模板参数
    if(start == word.length()){								
        return node.isWord;					// (★) 
    }

    int c = word.charAt(start) - 'a';
    return node.children[c] != null && match(word, node.children[c], start + 1);
}

判断是不是通配符：

不是通配符，还是原先的递归写法

if (word.charAt(index) != '.') {  // 不是通配符
    int ch = word.charAt(index) - 'a';
    return node.children[ch] != null && match(word, node.children[ch], index + 1);
}

是通配符，则递归判断后面是否是26个里面

for (int i = 0; i < 26; i++) {
    if (node.children[i] != null && match(word, node.children[i], index + 1)) {
        return true;
    }
}

3. 参考代码

class WordDictionary {
    class TrieNode {
        boolean isWord;
        TrieNode[] children = new TrieNode[26];
    }
    TrieNode root;

    public WordDictionary() {
        root = new TrieNode();
    }
    
    public void addWord(String word) {  // insert模板
        TrieNode cur = root;
        for (int i = 0; i < word.length(); i++) {
            int ch = word.charAt(i) - 'a';
            if (cur.children[ch] == null) {
                cur.children[ch] = new TrieNode();
            }
            cur = cur.children[ch];
        }
        cur.isWord = true;
    }
    
    public boolean search(String word) {
        return match(word, root, 0);
    }
    public boolean match(String word, TrieNode node, int index) {
        if (index == word.length()) {  // 终止条件
            return node.isWord;
        }
        if (word.charAt(index) != '.') {  // 不是通配符
            int ch = word.charAt(index) - 'a';
            return node.children[ch] != null && match(word, node.children[ch], index + 1);
        } else { // 是通配符,对26中可能进行递归
            for (int i = 0; i < 26; i++) {
                if (node.children[i] != null && match(word, node.children[i], index + 1)) {
                    return true;
                }
            }
            return false;
        }
    }
}

676. 实现一个魔法字典

1. 题目描述

leetcode链接：676. 实现一个魔法字典
在这里插入图片描述

2. 思路分析

允许且必须变化一个字符后再匹配——递归的search

一个棘手的问题，就是当字典树中有"hello"和"hallo"时，search(“hello”)会返回false。

问题的关键在于：一般我们写search，都是根据word先算出下标————这会导致，字典树从hello这条路，一路走到头，因为没有修改任何一个字母导致返回false。
因此，千万要抛弃这个字典树的search模板，改为一次for(26)的遍历。

逻辑是：

发现这个字母可行后，再去看这个"可行的字母"是不是就是"word.charAt(start)"
而不是根据"word.charAt(start)“，看这个字母是否"可行” (可行的意思是，这是字典树的一个合法节点)

对26个字符进行一次遍历，看能否替换一个字符。

3. 参考代码

class MagicDictionary {
    class TrieNode {
        boolean isWord;
        TrieNode[] children = new TrieNode[26];
    }
    TrieNode root;

    public MagicDictionary() {
        root = new TrieNode();
    }
    
    public void buildDict(String[] dictionary) {  // insert模板
        for (String word : dictionary) {
            TrieNode cur = root;
            for (int i = 0; i < word.length(); i++) {
                int ch = word.charAt(i) - 'a';
                if (cur.children[ch] == null) {
                    cur.children[ch] = new TrieNode();
                }
                cur = cur.children[ch];
            }
            cur.isWord = true;
        }
    }
    
    public boolean search(String searchWord) {
        return match(searchWord, root, 0, true);
    }
    public boolean match(String searchWord, TrieNode node, int index, boolean flag) {
        if (index == searchWord.length()) {
            return node.isWord && !flag;  // 必须变一个字符
        }
        for (int i = 0; i < 26; i++) {
            if (node.children[i] != null) {
                if (searchWord.charAt(index) - 'a' == i && match(searchWord, node.children[i], index + 1, flag)) {
                    return true;
                }
                if (searchWord.charAt(index) - 'a' != i && flag && match(searchWord, node.children[i], index + 1, false)) {
                    return true;
                }
            }
        }
        return false;
    }
}