9.Trie字典树

最新推荐文章于 2023-10-16 15:24:03 发布

书山压力大EEE

最新推荐文章于 2023-10-16 15:24:03 发布

阅读量212

点赞数

分类专栏：数据结构文章标签：数据结构

本文链接：https://blog.csdn.net/weixin_41207499/article/details/81057799

版权

数据结构专栏收录该内容

13 篇文章 2 订阅

订阅专栏

一. 什么是Trie字典树

引出

微软曾经在做通讯录的时候遇到了搜索效率低的问题(当时cpu计算能力低)，被一个实习生用Trie实现了

虽然字典map的数据结构结构也能实现通讯录的查询功能，但效率远没有Trie高

Trie的结构

以装载单词为例：
每一个叶子节点 代表一个单词
p线路存了pan 和 panda 两个单词, 
节点的boolean isWord  判断到该节点是否可以组成一个单词。
每个节点的next都是一个map映射

二.Trie字典树基础

新建项目Trie
按前一节的逻辑编写Trie.java

import java.util.TreeMap;


public class Trie {

    private class Node{
        public boolean isWord;
        public TreeMap<Character, Node> next;

        public Node(boolean isWord){
            this.isWord = isWord;
            next = new TreeMap<>();
        }

        public Node(){
            this(false);
        }
    }



    private Node root;
    private int size;

    public Trie(){
        root = new Node();
        size=0;
    }

    //获得Trie中存储的单词数量
    public int getSize(){
        return size;
    }

    // 向Trie中添加一个新的单词word
    public void add(String word){
        Node cur = root;
        for(int i=0; i < word.length(); i++){
            char c = word.charAt(i);
            if(cur.next.get(c) == null){
                cur.next.put(c, new Node());
            }
            cur = cur.next.get(c);
        }
        if(!cur.isWord){   //前面的没有组成单词
            cur.isWord = true;
            size++;
        }

    }
}

三. Trie字典树的查询

查询的方法和add方法类似
Trie.java

...

    // 查询单词word是否在Trie中
    public boolean contains(String word){
        Node cur = root;
        for(int i = 0; i < word.length(); i++){
            char c = word.charAt(i);
            if(cur.next.get(c) == null){
                return false;
            }
            cur = cur.next.get(c);
        }


        return cur.isWord;   //防止 比如有panda这个单词  但没有pan这个单词的情况
    }
}

将以前的二分搜索树集合放入项目中，进行对比测试

.
├── Trie.iml
├── pride-and-prejudice.txt
└── src
    ├── BST.java     //二分搜索树结构
    ├── BSTSet.java  //二分搜索树集合
    ├── FileOperation.java   //读取文章
    ├── Main.java
    ├── Set.java       // 集合的接口文件
    └── Trie.java

Main.java

import java.util.ArrayList;

public class Main {

    public static void main(String[] args) {

        System.out.println("Pride and Prejudice");

        ArrayList<String> words = new ArrayList<>();
        if(FileOperation.readFile("pride-and-prejudice.txt", words)){

            long startTime = System.nanoTime();

            BSTSet<String> set = new BSTSet<>();
            for(String word: words)
                set.add(word);

            for(String word: words)
                set.contains(word);   //查询  ，只是运行一下逻辑

            long endTime = System.nanoTime();

            double time = (endTime - startTime) / 1000000000.0;

            System.out.println("Total different words: " + set.getSize());
            System.out.println("BSTSet: " + time + " s");

            // ---

            startTime = System.nanoTime();

            Trie trie = new Trie();
            for(String word: words)
                trie.add(word);

            for(String word: words)
                trie.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;

            System.out.println("Total different words: " + trie.getSize());
            System.out.println("Trie: " + time + " s");
        }
    }
}

运行结果：

Pride and Prejudice
Total different words: 6530
BSTSet: 0.574562945 s
Total different words: 6530
Trie: 0.257158053 s

Trie更具性能优势，因为Trie的效率只和要查找的单词的长度有关。

四. Trie字典树的前缀查询

有的单词虽然没有存储在Trie中，但存在在已经储存的单词的前缀中。比如panda和pan。
Trie.java

...

    //查询是否在Trie中有单词以prefix为前缀
    public boolean isPrefix(String prefix){
        Node cur = root;
        for(int i = 0; i < prefix.length(); i++){
            char c = prefix.charAt(i);
            if(cur.next.get(c) == null){
                return false;
            }
            cur = cur.next.get(c);
        }
        
        return true;
    }
}

五. Trie字典树和简单的模式匹配

Leetcode上211号问题

设计一个支持以下两种操作的数据结构：

void addWord(word)
bool search(word)
search(word) 可以搜索文字或正则表达式字符串，字符串只包含字母 ‘.’ 或 ‘a-z’ 。 ‘.’ 可以表示任何一个字母。

示例:

addWord("bad")
addWord("dad")
addWord("mad")
search("pad") -> false
search("bad") -> true
search(".ad") -> true
search("b..") -> true
说明:

你可以假设所有单词都是由小写字母 a-z 组成的。




接口：
class WordDictionary {

    /** Initialize your data structure here. */
    public WordDictionary() {
        
    }
    
    /** Adds a word into the data structure. */
    public void addWord(String word) {
        
    }
    
    /** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
    public boolean search(String word) {
        
    }
}

/**
 * Your WordDictionary object will be instantiated and called as such:
 * WordDictionary obj = new WordDictionary();
 * obj.addWord(word);
 * boolean param_2 = obj.search(word);
 */

实现:

import java.util.TreeMap;

public class WordDictionary {

    private class Node{

        public boolean isWord;
        public TreeMap<Character, Node> next;

        public Node(boolean isWord){
            this.isWord = isWord;
            next = new TreeMap<>();
        }

        public Node(){
            this(false);
        }
    }

    private Node root;

    /** Initialize your data structure here. */
    public WordDictionary() {
        root = new Node();
    }

    /** Adds a word into the data structure. */
    public void addWord(String word) {

        Node cur = root;
        for(int i = 0 ; i < word.length() ; i ++){
            char c = word.charAt(i);
            if(cur.next.get(c) == null)
                cur.next.put(c, new Node());
            cur = cur.next.get(c);
        }
        cur.isWord = true;
    }

    /** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
    public boolean search(String word) {
        return match(root, word, 0);
    }

    // node节点开始  匹配word
    private boolean match(Node node, String word, int index){

        if(index == word.length())
            return node.isWord;

        char c = word.charAt(index);

        if(c != '.'){   
            if(node.next.get(c) == null)
                return false;
            return match(node.next.get(c), word, index + 1);
        }
        else{   //如果是'.'（代表可以匹配任意字符）  则取所有的下一个字母 继续匹配
            for(char nextChar: node.next.keySet())    //keySet是TreeMap的一个方法  取出所有的key对应的集合
                if(match(node.next.get(nextChar), word, index + 1))
                    return true;
            return false;
        }
    }
}

六. Trie字典树和字符串映射

Leetcode上677号问题:

实现一个 MapSum 类里的两个方法，insert 和 sum。

对于方法 insert，你将得到一对（字符串，整数）的键值对。字符串表示键，整数表示值。如果键已经存在，那么原来的键值对将被替代成新的键值对。

对于方法 sum，你将得到一个表示前缀的字符串，你需要返回所有以该前缀开头的键的值的总和。

示例 1:

输入: insert("apple", 3), 输出: Null
输入: sum("ap"), 输出: 3
输入: insert("app", 2), 输出: Null
输入: sum("ap"), 输出: 5

接口
class MapSum {

    /** Initialize your data structure here. */
    public MapSum() {
        
    }
    
    public void insert(String key, int val) {
        
    }
    
    public int sum(String prefix) {
        
    }
}

/**
 * Your MapSum object will be instantiated and called as such:
 * MapSum obj = new MapSum();
 * obj.insert(key,val);
 * int param_2 = obj.sum(prefix);
 */

思路:

我们将原来的isWord 改为题目中的单词对应的那个值value， value默认为0

解答

import java.util.TreeMap;

public class MapSum {

    private class Node{
        public int value;   //  如果这个节点代表一个词的话   该单词对应的值   不存在单词则为0，不需要isWord了，
        public TreeMap<Character, Node> next;

        public Node(int value){
            this.value = value;
            next = new TreeMap<>();
        }

        public Node() {this(0);}
    }

    private Node root;



    /** Initialize your data structure here. */
    public MapSum() {
        root = new Node();
    }

    public void insert(String word, int val) {
        Node cur = root;
        for(int i = 0; i < word.length(); i++){
            char c = word.charAt(i);
            if(cur.next.get(c) == null){
                cur.next.put(c, new Node());
            }
            cur = cur.next.get(c);
        }
        cur.value = val;
    }

    public int sum(String prefix) {
        Node cur = root;
        for(int i = 0; i < prefix.length(); i ++){
            char c = prefix.charAt(i);
            if(cur.next.get(c) == null){
                return 0;
            }
            cur = cur.next.get(c);
        }
        return sum(cur);

    }

    // 遍历node和它的子树 所有的value值  全部加起来
    private int sum(Node node){
        //递归到底的情况， 可以不写， 因为 后面size==0时， 后面的for不会进行
//        if(node.next.size() == 0){
//            return node.value;
//        }
        
        
        int res = node.value;
        for(char c:node.next.keySet()){
            res += sum(node.next.get(c));
        }
        return res;
    }
}

书山压力大EEE

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
9.Trie字典树

一. 什么是Trie字典树引出微软曾经在做通讯录的时候遇到了搜索效率低的问题(当时cpu计算能力低)，被一个实习生用Trie实现了虽然字典map的数据结构结构也能实现通讯录的查询功能，但效率远没有Trie高Trie的结构以装载单词为例：每一个叶子节点代表一个单词p线路存了pan 和 panda 两个单词, 节点的boolean isWord 判断到该节点是否可以组成一个单词。每个节...
复制链接

扫一扫