Trie 数据结构源码分析(源码来源 rt.jar)

最新推荐文章于 2022-09-05 15:27:44 发布

书生杨阳

最新推荐文章于 2022-09-05 15:27:44 发布

阅读量341

点赞数 1

分类专栏：技术学习

本文链接：https://blog.csdn.net/Petershusheng/article/details/80754189

版权

技术学习专栏收录该内容

17 篇文章 0 订阅

订阅专栏

先上图:
如下是 Trie 数的数据结构:
这里写图片描述
Trie 树的作用:统计，排序和保存字符串
快速查找某个单词key 所对应的 value 值(类似 Map 的功能)

(扩展的时候,可以判断某个 key 是否存在, 统计某个前缀被多少个单词占用(文章后面提到))

优点:
做查询的时候, 性能不会随着总的数据量增加而导致查询效率下降(空间换时间的思想)
(其他树形查找算法, 如二叉树查找, 性能会随着总的数据量增长而下降,而 Trie 树查找时的比较次数只跟 key 的长度有关系)

node 节点数据结构:

  /**
   * The node representation for the trie.
   * @xsl.usage internal
   */
  class Node
  {

    /**
     * Constructor, creates a Node[ALPHA_SIZE].
     */
    Node()
    {
      m_nextChar = new Node[ALPHA_SIZE];
      m_Value = null;
    }

    /** The next nodes.   */
    Node m_nextChar[];

    /** The value.   */
    Object m_Value;
  }

put()方法:

 /**
   * Put an object into the trie for lookup.
   *
   * @param key must be a 7-bit ASCII string,只处理128个字符
   * @param value any java object.
   *
   * @return The old object that matched key, or null.
   */
  public Object put(String key, Object value)
  {

    final int len = key.length();
    if (len > m_charBuffer.length)
    {
        // make the biggest buffer ever needed in get(String)
        //m_charBuffer用在 get 的时候用于装载key的字符数组,排除掉 key 过长的情况
        m_charBuffer = new char[len];
    }
    //指向根节点
    Node node = m_Root;

    for (int i = 0; i < len; i++)
    {
    //检查该节点的m_nextChar数组中是否包含key.charAt(i)该字符
      Node nextNode = node.m_nextChar[Character.toUpperCase(key.charAt(i))];

      if (nextNode != null)
      {
        node = nextNode;
      }
      else
      {
      //向 node 节点插入剩余的字符
        for (; i < len; i++)
        {
          Node newNode = new Node();
          // put this value into the tree with a case insensitive key
          //不区分大小写是为了在 get 的时候能忽略大小写
          node.m_nextChar[Character.toUpperCase(key.charAt(i))] = newNode;
          node.m_nextChar[Character.toLowerCase(key.charAt(i))] = newNode;
          node = newNode;
        }
        break;
      }
    }
    //把最终节点的旧的值return
    Object ret = node.m_Value;
    //node 节点的 value 指向传入的 value
    node.m_Value = value;

    return ret;
  }

get()方法:

public Object get(final String key)
{

    final int len = key.length();

    /* If the name is too long, we won't find it, this also keeps us
     * from overflowing m_charBuffer
     */
    if (m_charBuffer.length < len)
        return null;

    Node node = m_Root;
    switch (len)
    {
        // case 0 looks silly, but the generated bytecode runs
        // faster for lookup of elements of length 2 with this in
        // and a fair bit faster.  Don't know why.
        case 0 :
            {
                return null;
            }
//直接查询跟节点 Node
        case 1 :
            {
                final char ch = key.charAt(0);
                if (ch < ALPHA_SIZE)
                {
                    node = node.m_nextChar[ch];
                    if (node != null)
                        return node.m_Value;
                }
                return null;
            }
                    default :
            {
                key.getChars(0, len, m_charBuffer, 0);
                // copy string into array
                for (int i = 0; i < len; i++)
                {
                    final char ch = m_charBuffer[i];
                    //非 ASCII码不处理
                    if (ALPHA_SIZE <= ch)
                    {
                        return null;
                    }

                    node = node.m_nextChar[ch];
                    if (node == null)
                        return null;
                }
                return node.m_Value;
            }
    }
}

源码的数据结构是没法统计多少个单词占用了相同的前缀的.

要实现该功能, 则需要在 Node 节点中加入一个属性(如:sum), 每 put一个单词到 Trie 树时, 所经过的字符都+1 ; 当需要统计某个前缀被多少个单词共用时, 直接读取该前缀的最后一个字符的sum 字段即可.

要使用 Trie 树判断某个 key 是否在树上, 可以在 Node 节点中加入一个boolean 标志位, 在插入每个单词的时候, 最后一个字符所在的 node 把标志位设置为 true 即可.(就实现了 set 的功能)

如果要实现 Trie 树排序的功能, 只需要先将所有 key 存入 Trie 树, 然后通过深度遍历的方式读取即可实现排序.

从源码看, Node根节点是存储数据的 , Trie对象有个属性m_Root指向最上层的 node 节点(根节点).

测试代码:

    public static void main(String[] args) {
        Trie trie = new Trie();
        trie.put("adef", "yangyang");
        trie.put("abg", "xiaochen");
        Object tem = trie.get("ABG");
        System.out.println(trie);
    }