Trie 字典树

最新推荐文章于 2024-07-23 10:15:44 发布

dskit

最新推荐文章于 2024-07-23 10:15:44 发布

阅读量1k

点赞数

文章标签： branch null terminal search insert border

本文链接：https://blog.csdn.net/dskit/article/details/4973441

版权

资料来源：http://www.cppblog.com/abilitytao/archive/2009/04/21/80598.html

Trie, 又称字典树、单词查找树,是一种树形结构，用于保存大量的字符串。它的优点是：利用字符串的公共前缀来节约存储空间。相对来说,Trie树是一种比较简单的数据结构.理解起来比较简单,正所谓简单的东西也得付出代价.故Trie树也有它的缺点,Trie树的内存消耗非常大.当然,或许用左儿子右兄弟的方法建树的话,可能会好点.其基本性质可以归纳为：

1. 根节点不包含字符，除根节点外每一个节点都只包含一个字符。

2. 从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串。

3. 每个节点的所有子节点包含的字符都不相同。

其基本操作有:查找插入和删除,当然删除操作比较少见.我在这里只是实现了对整个树的删除操作,至于单个word的删除操作也很简单.

搜索字典项目的方法为：

(1) 从根结点开始一次搜索；

(2) 取得要查找关键词的第一个字母，并根据该字母选择对应的子树并转到该子树继续进行检索；

(3) 在相应的子树上，取得要查找关键词的第二个字母,并进一步选择对应的子树进行检索。

(4) 迭代过程……

(5) 在某个结点处，关键词的所有字母已被取出，则读取附在该结点上的信息，即完成查找。其他操作类似处理.

Demo

```
/*
```
```
Name: Trie???????????? 
```
```
Author: MaiK 
```

Description: Trie???????????? ,???????? ??????????????

```
*/
```
```
#include
     
     

     
     
```

#include
     
     
      
      using namespace std;

```
const int sonnum=26, base='a';
```
```
struct Trie{    
```

    int num;                  //to remember how many word can reach here,that is to say,prefix

    bool terminal;            //If terminal==true ,the current point has no following point

    struct Trie *son[sonnum]; //the following point

```
    };
```
```
Trie *NewTrie() // create a new node
```
```
{    
```

    Trie *temp=new Trie;        temp->num=1;

```
    temp->terminal=false;    
```
```
    for(int i=0;i
     
     
```
```
        temp->son[i]=NULL;    
```
```
    return temp;
```
```
}
```

void Insert(Trie *pnt,char *s,int len) // insert a new word to Trie tree

```
{    
```
```
    Trie *temp=pnt;
```
```
    for(int i=0;i
     
     
```
```
    {        
```
```
        if(temp->son[s[i]-base]==NULL)
```

            temp->son[s[i]-base]=NewTrie();

```
        else 
```

            temp->son[s[i]-base]->num++;

```
        temp=temp->son[s[i]-base];    
```
```
    }        temp->terminal=true;
```
```
}
```

void Delete(Trie *pnt) // delete the whole tree

```
{    
```
```
    if(pnt!=NULL)     
```
```
    {        
```

        for(int i=0; i
     
     

      
                  if(pnt->son[i]!=NULL)

                Delete(pnt->son[i]);        

        delete pnt;         

        pnt=NULL;    

    }

}


Trie* Find(Trie *pnt,char *s,int len) //trie to find the current word

{    

    Trie *temp=pnt;    

    for(int i=0;i
        
        
        if(temp->son[s[i]-base]!=NULL)

            temp=temp->son[s[i]-base];        

        else 

            return NULL;    

    return temp;

}

以下资料来源：http://blog.chinaunix.net/u2/65170/showart_1073487.html <script language="javascript"> function CopyCode(key){var codeElement=null;var trElements=document.all.tags("ol");var i;for(i=0;i

Demo

// trie.cpp : ????????????????????????????

```
#include 
     
     

     
     
```
```
#include 
     
     

     
     
```
```
//#include 
     
     

     
     
```
```
#include 
     
     

     
     
```
```
using namespace std;
```
```
const int num_chars = 26;
```
```
class Trie {
```
```
public:
```
```
       Trie();
```
```
       Trie(Trie& tr);
```
```
     virtual ~Trie();
```

     int trie_search(const char* word, char* entry ) const;

     int insert(const char* word, const char* entry);

     int remove(const char* word, char* entry);

```
protected:
```
```
     struct Trie_node
```
```
     {
```

           char* data; //????????????????root????????????????????

           Trie_node* branch[num_chars]; //????

```
           Trie_node(); //???????? 
```
```
     };
```
```
     
```

       Trie_node* root; //??????(????)

```
};
```
```
Trie::Trie_node::Trie_node() 
```
```
{
```
```
      data = NULL;
```

    for (int i=0; i
     
     

      
                branch[i] = NULL;

}

Trie::Trie():root(NULL)

{

}

Trie::~Trie()

{

}

int Trie::trie_search(const char* word, char* entry ) const 

{

    int position = 0;//???? 


    char char_code;

      Trie_node *location = root; //???????????? 


    while( location!=NULL && *word!=0 ) 

    {

        if (*word>='A' && *word<='Z') 

              char_code = *word-'A';

        else if (*word>='a' && *word<='z') 

              char_code = *word-'a';

        else return 0;

          //???????????????? 


          location = location->branch[char_code];

          position++;

          word++;

    }

    //???????????????????????? 


    if ( location != NULL && location->data != NULL ) 

    {

        strcpy(entry,location->data);

        return 1;

    }

    else return 0;

}

int Trie::insert(const char* word, const char* entry) 

{

    int result = 1, position = 0;

    if ( root == NULL ) root = new Trie_node;//???????????????????? 


    char char_code;

      Trie_node *location = root; //???????????? 


    while( location!=NULL && *word!=0 )

    {

        if (*word>='A' && *word<='Z') 

              char_code = *word-'A';

        else if (*word>='a' && *word<='z') 

              char_code = *word-'a';

        else return 0;

        //???????????? 


        if( location->branch[char_code] == NULL ) 

              location->branch[char_code] = new Trie_node; //?????????? 


          //???????? 


          location = location->branch[char_code];

          position++;

          word++;

    }

    if (location->data != NULL)

          result = 0; //???????????????????? 


    //???????? 


    else {

          location->data = new char[strlen(entry)+1];//???????? 


        strcpy(location->data, entry); //??data???????????????? 


    }

    return result;

}

int main()

{

      Trie t;

    char entry[100];

      t.insert("aa", "DET"); 

      t.insert("abacus","NOUN");

      t.insert("abalone","NOUN"); 

      t.insert("abandon","VERB");

      t.insert("abandoned","ADJ"); 

      t.insert("abashed","ADJ");

      t.insert("abate","VERB"); 

      t.insert("this", "PRON");

    if (t.trie_search("this", entry))

        cout<<"'this' was found. pos: "<
        
        
         
         <
         
         
        
        
    if (t.trie_search("abate", entry))

        cout<<"'abate' is found. pos: "<
        
        
         
         <
         
         
        
        
    if (t.trie_search("baby", entry))

        cout<<"'baby' is found. pos: "<
        
        
         
         <
         
         
        
        
    else

        cout<<"'baby' does not exist at all!"<
        
        
    

    if (t.trie_search("aa", entry))

        cout<<"'aa was found. pos: "<
        
        
         
         <
         
         
        
        
    system("pause");

}

Trie树就是字符树，其核心思想就是空间换时间。
举个简单的例子。
给你100000个长度不超过10的单词。对于每一个单词，我们要判断他出没出现过，如果出现了，第一次出现第几个位置。
这题当然可以用hash来，但是我要介绍的是trie树。在某些方面它的用途更大。比如说对于某一个单词，我要询问它的前缀是否出现过。这样hash就不好搞了，而用trie还是很简单。
现在回到例子中，如果我们用最傻的方法，对于每一个单词，我们都要去查找它前面的单词中是否有它。那么这个算法的复杂度就是O(n^2)。显然对于100000的范围难以接受。现在我们换个思路想。假设我要查询的单词是abcd，那么在他前面的单词中，以b，c，d，f之类开头的我显然不必考虑。而只要找以a开头的中是否存在abcd就可以了。同样的，在以a开头中的单词中，我们只要考虑以b作为第二个字母的……这样一个树的模型就渐渐清晰了……
假设有b，abc，abd，bcd，abcd，efg，hii这6个单词，我们构建的树就是这样的。

对于每一个节点，从根遍历到他的过程就是一个单词，如果这个节点被标记为红色，就表示这个单词存在，否则不存在。
那么，对于一个单词，我只要顺着他从跟走到对应的节点，再看这个节点是否被标记为红色就可以知道它是否出现过了。把这个节点标记为红色，就相当于插入了这个单词。
这样一来我们询问和插入可以一起完成，所用时间仅仅为单词长度，在这一个样例，便是10。
我们可以看到，trie树每一层的节点数是26^i级别的。所以为了节省空间。我们用动态链表，或者用数组来模拟动态。空间的花费，不会超过单词数×单词长度。

密码破译
【问题描述】
由于最近功课过于繁忙，Tim竟然忘记了自己电脑的密码，幸运的是Tim在设计电脑密码的时候，用了一个非常特殊的方法记录下了密码。这个方法是：Tim把密码和其它的一些假密码共同记录在了一个本子上面。为了能够从这些字符串中找出正确的密码，Tim又在另外一个本子上面写了一个很长的字符串，而正确的密码就是在这个字符串中出现次数最多的一个密码。例如串ababa，假若密码是abab和aba，那么正确的密码是aba，因为aba在这个字符串中出现了2次。
现在你得到了Tim的这两个本子，希望你能够编写一个程序帮助Tim找出正确的密码。
【输入】
输入由两个部分组成。其中第一部分由若干行组成，每一行记录了一个密码，密码的均长度小于等于255位，并且都由小写字母组成。然后一个空行，第二部分记录了一个很长的字符串，并且以’.’结束，其中只包含了小写字母。
【输出】
输出文件名为Pass.out。输出文件由仅有一行，为一个整数，表示正确密码在字符串中出现的次数。如果这个出现次数为0，输出“No find”。
【样例】：
Pass.in Pass.out
ab 6
abc
bdc
abcd
abcabcabcdbdabcbabdbcabdbdbdbd.