Trie

Trie树

                                                         Trie树

       Trie树也称字典树,因为其效率很高,所以在在字符串查找、前缀匹配等中应用很广泛,其高效率是以空间为代价的。

一.Trie树的原理

    利用串构建一个字典树,这个字典树保存了串的公共前缀信息,因此可以降低查询操作的复杂度。

    下面以英文单词构建的字典树为例,这棵Trie树中每个结点包括26个孩子结点,因为总共有26个英文字母(假设单词都是小写字母组成)。

    则可声明包含Trie树的结点信息的结构体:

复制代码
#define MAX 26

typedef struct TrieNode //Trie结点声明
{
bool isStr; //标记该结点处是否构成单词
struct TrieNode *next[MAX]; //儿子分支
}Trie;
复制代码

    其中next是一个指针数组,存放着指向各个孩子结点的指针。

    如给出字符串"abc","ab","bd","dda",根据该字符串序列构建一棵Trie树。则构建的树如下:

    

 Trie树的根结点不包含任何信息,第一个字符串为"abc",第一个字母为'a',因此根结点中数组next下标为'a'-97的值不为NULL,其他同理,构建的Trie树如图所示,红色结点表示在该处可以构成一个单词。很显然,如果要查找单词"abc"是否存在,查找长度则为O(len),len为要查找的字符串的长度。而若采用一般的逐个匹配查找,则查找长度为O(len*n),n为字符串的个数。显然基于Trie树的查找效率要高很多。

但是却是以空间为代价的,比如图中每个结点所占的空间都为(26*4+1)Byte=105Byte,那么这棵Trie树所占的空间则为105*8Byte=840Byte,而普通的逐个查找所占空间只需(3+2+2+3)Byte=10Byte。

二.Trie树的操作

    在Trie树中主要有3个操作,插入、查找和删除。一般情况下Trie树中很少存在删除单独某个结点的情况,因此只考虑删除整棵树。

1.插入

  假设存在字符串str,Trie树的根结点为root。i=0,p=root。

  1)取str[i],判断p->next[str[i]-97]是否为空,若为空,则建立结点temp,并将p->next[str[i]-97]指向temp,然后p指向temp;

   若不为空,则p=p->next[str[i]-97];

  2)i++,继续取str[i],循环1)中的操作,直到遇到结束符'\0',此时将当前结点p中的isStr置为true。

2.查找

  假设要查找的字符串为str,Trie树的根结点为root,i=0,p=root

  1)取str[i],判断判断p->next[str[i]-97]是否为空,若为空,则返回false;若不为空,则p=p->next[str[i]-97],继续取字符。

  2)重复1)中的操作直到遇到结束符'\0',若当前结点p不为空并且isStr为true,则返回true,否则返回false。

3.删除

  删除可以以递归的形式进行删除。

测试程序:

/*Trie树(字典树) 2011.10.10*/
 
#include <iostream>
#include<cstdlib>
#define MAX 26
using  namespace  std;
 
typedef  struct  TrieNode                     //Trie结点声明
{
     bool  isStr;                            //标记该结点处是否构成单词
     struct  TrieNode *next[MAX];            //儿子分支
}Trie;
 
void  insert(Trie *root, const  char  *s)     //将单词s插入到字典树中
{
     if (root==NULL||*s== '\0' )
         return ;
     int  i;
     Trie *p=root;
     while (*s!= '\0' )
     {
         if (p->next[*s- 'a' ]==NULL)        //如果不存在,则建立结点
         {
             Trie *temp=(Trie *) malloc ( sizeof (Trie));
             for (i=0;i<MAX;i++)
             {
                 temp->next[i]=NULL;
             }
             temp->isStr= false ;
             p->next[*s- 'a' ]=temp;
             p=p->next[*s- 'a' ];  
         }  
         else
         {
             p=p->next[*s- 'a' ];
         }
         s++;
     }
     p->isStr= true ;                       //单词结束的地方标记此处可以构成一个单词
}
 
int  search(Trie *root, const  char  *s)  //查找某个单词是否已经存在
{
     Trie *p=root;
     while (p!=NULL&&*s!= '\0' )
     {
         p=p->next[*s- 'a' ];
         s++;
     }
     return  (p!=NULL&&p->isStr== true );      //在单词结束处的标记为true时,单词才存在
}
 
void  del(Trie *root)                      //释放整个字典树占的堆区空间
{
     int  i;
     for (i=0;i<MAX;i++)
     {
         if (root->next[i]!=NULL)
         {
             del(root->next[i]);
         }
     }
     free (root);
}
 
int  main( int  argc, char  *argv[])
{
     int  i;
     int  n,m;                              //n为建立Trie树输入的单词数,m为要查找的单词数
     char  s[100];
     Trie *root= (Trie *) malloc ( sizeof (Trie));
     for (i=0;i<MAX;i++)
     {
         root->next[i]=NULL;
     }
     root->isStr= false ;
     scanf ( "%d" ,&n);
     getchar ();
     for (i=0;i<n;i++)                 //先建立字典树
     {
         scanf ( "%s" ,s);
         insert(root,s);
     }
     while ( scanf ( "%d" ,&m)!=EOF)
     {
         for (i=0;i<m;i++)                 //查找
         {
             scanf ( "%s" ,s);
             if (search(root,s)==1)
                 printf ( "YES\n" );
             else
                 printf ( "NO\n" );
         }
         printf ( "\n" );  
     }
     del(root);                         //释放空间很重要
     return  0;
}

训练题目:

http://acm.hdu.edu.cn/showproblem.php?pid=1671

http://acm.hdu.edu.cn/showproblem.php?pid=1075

http://acm.hdu.edu.cn/showproblem.php?pid=1251

 

Trie


  1. What is a trie:

    You've probably already seen kinds of trees that store things more efficiently, such as a binary search tree. Here, we will examine another variant of a tree, called a trie.


    Aside: The name trie comes from its use for retrieval. It is pronounced like "try" by some, like "tree" (the pronunciation of "trie" in "retrieval") by others. Here, we will discuss a particular implementation of a trie, which may be somewhat different than how it is described elsewhere.

    We use a trie to store pieces of data that have a key (used to identify the data) and possibly a value (which holds any additional data associated with the key).

    Here, we will use data whose keys are strings.

    Suppose we want to store a bunch of name/age pairs for a set of people (we'll consider names to be a single string here).

    Here are some pairs:

    amy	56
    ann	15
    emma	30
    rob	27
    roger	52
    

    Now, how will we store these name/value pairs in a trie? A trie allows us to share prefixes that are common among keys. Again, our keys are names, which are strings.

    Let's start off with amy. We'll build a tree with each character in her name in a separate node. There will also be one node under the last character in her name (i.e., under y). In this final node, we'll put the nul character (\0) to represent the end of the name. This last node is also a good place to store the age for amy.

          .     <- level 0 (root)
          |
          a     <- level 1
          |
          m     <- level 2
          |
          y     <- level 3
          |
        \0 56   <- level 4
    

    Note that each level in the trie holds a certain character in the string amy. The first character of a string key in the trie is always at level 1, the second character at level 2, etc.

    Now, when we go to add ann, we do the same thing; however, we already have stored the letter a at level 1, so we don't need to store it again, we just reuse that node with a as the first character. Under a (at level 1), however, there is only a second character of m...But, since ann has a second character of n, we'll have to add a new branch for the rest of ann, giving:

         .
         |
         a
       /   \
      m     n
      |     |
      y     n
      |     |
    \0 56 \0 15
    


    Note: Again, ann's data (an age of 15) is stored in her last node.

    Now, let's add emma. Remember e is the first character and should go at level 1. Since there is no node with character e at level 1, we'll have to add it. In addition, we'll have to add nodes for all the other characters of emma under the e. The first m will be a child of the e, the next m will be below the first m, etc., giving:

              .
          /       \
         a         e
       /   \       |
      m     n      m
      |     |      |
      y     n      m
      |     |      |
    \0 56 \0 15    a
                   |
                 \0 30
    

    Now, let's add the last two names, namely rob and roger, giving:

                  .
          /       |      \
         a        e       r
       /   \      |       |
      m     n     m       o
      |     |     |     /   \
      y     n     m    b     g
      |     |     |    |     |
    \0 56 \0 15   a  \0 27   e
                  |          |
                \0 30        r
                             |
                           \0 52
    

    Because the key for each piece of data is a sequence of characters, we will sometimes refer to that sequence as the keys (plural) for that data. For example, ann's data is referenced using the keys ann (in that order).

    To better understand how a trie works, answer the following questions.

    • What would the trie look like if we now added anne with age 67? How about ro with age 23?
    • Would the trie look different if we added the names in a different order, say: rob, ann, emma, roger, amy?
    • Is this a binary tree, tertiary tree or what? In other words, each node has at most how many children?

  2. Trie operations:

    Here are the operations that we will concern ourselves with for this trie. You may need others for a particular use of the trie.

    • Add:

      We've already given examples of adding.

    • IsMember:

      See if data with a certain string key is in the trie.

      For example,  IsMember(trie, "amy")  should report a true value and and  IsMember(trie, "anna")  should report a false value.

      We can imagine other variations where we do something with the value (like return it) once we find something with the matching key.

    • Remove:

      Remove something from the trie, given its key.


    We may want more operations depending on how we'll use the trie.

    Since our trie holds data with string keys, which of the operations need a key and value, and which just need keys?

  3. IsMember algorithm:

    Remember that a trie is a special kind of tree. Since a trie organizes its data via the keys (as specified above), it is easy to find whether a particular key is present.

    Finding a key can be done with iteration (looping).

    Here is an outline of such an algorithm. It looks in a particular trie and determines whether data with a particular string key is present.

    IsMember(triekey) [iterative]

    1. Search top level for node that
       matches first character in key
    2. If none,
         return false
       Else,
    3. If the matched character is \0?
         return true
       Else,
    4. Move to subtrie that matched this character
    5. Advance to next character in key*
    6. Go to step 1
    


    * I.e., the new search key becomes the old one without its first character.

    The algorithm moves down the tree (to a subtree) at step 6. Thus, the top level in step 1 actually may refer to any level in the tree depending on what subtree the algorithm is currently at.

  4. Trie implementation:

    Now, let's think about how to actually implement a trie of name/age pairs in C.

    As usual, we'll put the data structure in its own module by producing the source files trie.h and trie.c.

    The functions needed for our trie are the operations we mentioned:

    TrieAdd()
    TrieIsMember()
    TrieRemove()
    

    However, we also need additional functions for setup and cleanup:

    TrieCreate()
    TrieDestroy()
    

    Now, before we ponder the details of the trie functions, what must we decide on?

  5. Organization of data types for a trie:

    Let's think about the data types for a trie and how to divide them between the interface (in trie.h) and the implementation (in trie.c) using ADTs and CDTs.

    We'll start with the type of a value. Since our values are ages, we have the following:

    typedef int trieValueT;
    

    Since the type of values is something that people using the trie need to know, it goes in the interface (trie.h).

    Next, we decided that keys will always be strings. However, we will not construct elements that are made up of strings and values. The reason is that we do not store entire string keys in nodes of the trie. Remember, we store only the individual characters of the string key in the nodes.

    Thus, the type of a node begins as:

    typedef struct trieNodeTag {
      char key;
      trieValueT value;
      ...
    } trieNodeT;
    

    Since it is only a detail of the implementation, it goes in trie.c.


    Note: We could make the trie more generic, by allowing it to handle keys that are any type of array, i.e., arrays of things other than characters. For other types of arrays, we'd have to determine how to represent the end-of-key, which we currently do with the nul character (\0).

    For now, we'll just hardcode the use character for the key stored at each node, and string (i.e., array of character) for the entire key (or sequence of keys) associated with each piece of data.


    Now we need to complete the type of a node.  How will we construct a tree whose nodes can have several children? One way is to have the children of a node be part of a linked list of nodes.

    Structure
    If we view siblings at a level as being linked in a list, then the trie we saw above now could be viewed structurally as:
          |
          a --------- e ----- r
          |           |       |
          m --- n     m       o
          |     |     |       |
          y     n     m       b ----- g
          |     |     |       |       |
        \0 56 \0 15   a     \0 27     e
                      |               |
                    \0 30             r
                                      |
                                    \0 52
    

    First, the associated nodes at a given level form a linked list (e.g., aer at level 1). Note, however, that each level may have more than one linked lists. For example, at the second level, m and n form their own list (as they are associated with a at the first level). Likewise, m (as it is associate with e at the first level) forms its own linked list. And finally, o, which is associated with r at the first level, forms its own list.

    Thus, each node (e.g., a at level 1) has a link to the next node at that level and a link to a list of its children. To implement this structure, we will need two pointers in a node, giving:

    typedef struct trieNodeTag {
      char key;
      trieValueT value;
      struct trieNodeTag *next, *children;
    } trieNodeT;
    


    Note: The value part of a node is unused in most cases since we only store the value in the node with the nul character (\0) as a key. If a value was something that was large, we would have to consider being smarter about our design.

    The only types left are those that keep track of the trie. Based on our choice for the structure of the trie implementation, we see we'll need a pointer to the top level's first node.

    Since this pointer has to do with the implementation of the trie, we put it in the concrete typestruct trieCDT:

    typedef struct trieCDT {
      trieNodeT *root;
    } trieCDT;
    

    In the interface, we must fill in what the abstract type is as follows:

    typedef struct trieCDT *trieADT;
    

    Finally, we have:

    trie.h                          trie.c
    ------				------
    				#include "trie.h"
    
    				typedef struct trieNodeTag {
    				  char key;
    				  trieValueT value;
    typedef int trieValueT;		  struct trieNodeTag *next,
    				                     *children;
    				} trieNodeT;		
    
    typedef struct trieCDT		typedef struct trieCDT {
    	*trieADT;		  trieNodeT *root;
    				} trieCDT;
    

  6. Using a trie:

    Now that we've decided on the data types for a trie, we can imagine how our trie will be used:

    trieADT trie;
    
    trie = TrieCreate();
    
    TrieAdd(trie, "amy", 56);
    TrieAdd(trie, "ann", 15);
    
    if (TrieIsMember(trie, "amy"))
      ...
    

    When someone needs a trie, they define a trieADT variable and set it up with TrieCreate().


    Note: Since we don't store entire string keys and values together (per our discussion above), you might pass a key and a value separately to TrieAdd().

  7. Filling in trie functions:

    Let's now consider the prototype for our TrieIsMember() function:

    int TrieIsMember(trieADT trie, char keys[]);
    

    It must take the trie in which to look for data and the string key (i.e., a sequence of character keys) used to find that data. In addition, it needs to return a true or false value based on whether it finds the key or not.

    Here is an implementation based on the algorithm we already discussed:

    int TrieIsMember(trieADT trie, char keys[])
    {
      /* Start at the top level. */
      trieNodeT *level = trie->root;
    
      /* Start at beginning of key. */
      int i = 0;
    
      for (;;) {
        trieNodeT *found = NULL;
        trieNodeT *curr;
    
        for (curr = level; curr != NULL; curr = curr->next) {
          /*
           * Want a node at this level to match
           * the current character in the key.
           */
          if (curr->key == keys[i]) {
            found = curr;
            break;
          }
        }
    
        /*
         * If either no nodes at this level or none
         * with next character in key, then key not
         * present.
         */
        if (found == NULL)
          return 0;
    
        /* If we matched end of key, it's there! */
        if (keys[i] == '\0')
          return 1;
    
        /* Go to next level. */
        level = found->children;
    
        /* Advance in string key. */
        i++;
      }
    }
    

    Fill in the prototypes for the rest of the trie functions:

    return-type TrieCreate(parameters);
    return-type TrieDestroy(parameters);
    return-type TrieAdd(parameters);
    int         TrieIsMember(trieADT trie, char keys[]);
    return-type TrieRemove(parameters);
    ...
    
    and then implement them.

  8. A more generic trie:

    We can easily redesign the trie so that it can use keys that are different kinds of arrays.


BU CAS CS - Trie 
Copyright © 1993-2000 by Robert I. Pitts <rip at bu dot edu>. All Rights Reserved.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值