维护100亿个URL(Radix TRee)_trie 100亿-CSDN博客

http://s.sousb.com/2011/04/19/%E7%BB%B4%E6%8A%A4100%E4%BA%BF%E4%B8%AAurl/

题目：url地址比如http://www.baidu.com/s?wd=baidu 的属性，包括定长属性（比如其被系统发现的时间）和不定长属性（比如其描述）实现一个系统a.储存和维护100亿个url及其属性。b.实现url及其属性的增删改。c.查一个url是否在系统中并给出信息。d.快速选出一个站点下所有url

提示：因为数据量大，可能存储在多台计算机中。

分析：这是一道百度的笔试题，这道题比较难，笔者只能给出几个认识到的点。

首先，这些url要经过partition分到X台机器中：考虑使用一个hash函数hash(hostname(url))将url分配到X台机器中，这样做的目的：一是数据的分布式存储，二是同一个站点的所有url保存到同一台机器中。
其次，每台机器应该如何组织这些数据？一种思路是用数据库的思路去解决，这里提供另外一种思路。考虑将url直接放在内存，接将url组织成树状结构，对于字符串来说，最长使用的是Trie tree，由于所占空间由最长url决定，在这里绝对不适用，再加上很多url拥有相同的属性（如路径等）这样，使用trie tree 的一个变种radix tree，相比会非常节省空间，并且不会影响效率。
最后，给出了存储模型，上面的abcd四问该怎么回答，这里就不一一解答了。

Radix tree

From Wikipedia, the free encyclopedia

(Redirected from Patricia trie)

In computer science, a radix tree (also patricia trie or radix trie or compact prefix tree) is a space-optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of elements as well as single elements. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.

As an optimization, edge labels can be stored in constant size by using two pointers to a string (for the first and last elements). ^[1]

Note that although the examples in this article show strings as sequences of characters, the type of the string elements can be chosen arbitrarily (for example, as a bit or byte of the string representation when using multibyte character encodings or Unicode).

[hide]

Applications[edit]

As mentioned, radix trees are useful for constructing associative arrays with keys that can be expressed as strings. They find particular application in the area of IP routing, where the ability to contain large ranges of values with a few exceptions is particularly suited to the hierarchical organization of IP addresses.^[2] They are also used for inverted indexes of text documents in information retrieval.

Operations[edit]

Radix trees support insertion, deletion, and searching operations. Insertion adds a new string to the trie while trying to minimize the amount of data stored. Deletion removes a string from the trie. Searching operations include exact lookup, find predecessor, find successor, and find all strings with a prefix. All of these operations are O(k) where k is the maximum length of all strings in the set. This list may not be exhaustive.

Lookup[edit]

Finding a string in a Patricia trie

The lookup operation determines if a string exists in a trie. Most operations modify this approach in some way to handle their specific tasks. For instance, the node where a string terminates may be of importance. This operation is similar to tries except that some edges consume multiple elements.

The following pseudo code assumes that these classes exist.

Edge

Node targetNode
string label

Node

Array of Edges edges
function isLeaf()

function lookup(string x)
{
  // Begin at the root with no elements found
  Node traverseNode := root;
  int elementsFound := 0;
  
  // Traverse until a leaf is found or it is not possible to continue
  while (traverseNode != null && !traverseNode.isLeaf() && elementsFound < x.length)
  {
    // Get the next edge to explore based on the elements not yet found in x
    Edge nextEdge := select edge from traverseNode.edges where edge.label is a prefix of x.suffix(elementsFound)
      // x.suffix(elementsFound) returns the last (x.length - elementsFound) elements of x
  
    // Was an edge found?
    if (nextEdge != null)
    {
      // Set the next node to explore
      traverseNode := nextEdge.targetNode;
    
      // Increment elements found based on the label stored at the edge
      elementsFound += nextEdge.label.length;
    }
    else
    {
      // Terminate loop
      traverseNode := null;
    }
  }
  
  // A match is found if we arrive at a leaf node and have used up exactly x.length elements
  return (traverseNode != null && traverseNode.isLeaf() && elementsFound == x.length);
}

Insertion[edit]

To insert a string, we search the tree until we can make no further progress. At this point we either add a new outgoing edge labeled with all remaining elements in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labeled with the common prefix) and proceed. This splitting step ensures that no node has more children than there are possible string elements.

Several cases of insertion are shown below, though more may exist. Note that r simply represents the root. It is assumed that edges can be labelled with empty strings to terminate strings where necessary and that the root has no incoming edge.

Insert 'water' at the root
Insert 'slower' while keeping 'slow'
Insert 'test' which is a prefix of 'tester'
Insert 'team' while splitting 'test' and creating a new edge label 'st'
Insert 'toast' while splitting 'te' and moving previous strings a level lower

Deletion[edit]

To delete a string x from a tree, we first locate the leaf representing x. Then, assuming x exists, we remove the corresponding leaf node. If the parent of our leaf node has only one other child, then that child's incoming label is appended to the parent's incoming label and the child is removed.

Additional Operations[edit]

Find all strings with common prefix: Returns an array of strings which begin with the same prefix.
Find predecessor: Locates the largest string less than a given string, by lexicographic order.
Find successor: Locates the smallest string greater than a given string, by lexicographic order.

History[edit]

Donald R. Morrison first described what he called "Patricia trees" in 1968;^[3] the name comes from the acronym PATRICIA, which stands for "Practical Algorithm To Retrieve Information Coded In Alphanumeric". Gernot Gwehenberger independently invented and described the data structure at about the same time.^[4]

Comparison to other data structures[edit]

(In the following comparisons, it is assumed that the keys are of length k and the data structure contains n members.)

Unlike balanced trees, radix trees permit lookup, insertion, and deletion in O(k) time rather than O(log n). This doesn't seem like an advantage, since normally k ≥ logn, but in a balanced tree every comparison is a string comparison requiring O(k) worst-case time, many of which are slow in practice due to long common prefixes (in the case where comparisons begin at the start of the string). In a trie, all comparisons require constant time, but it takes m comparisons to look up a string of length m. Radix trees can perform these operations with fewer comparisons, and require many fewer nodes.

Radix trees also share the disadvantages of tries, however: as they can only be applied to strings of elements or elements with an efficiently reversible mapping to strings, they lack the full generality of balanced search trees, which apply to any data type with a total ordering. A reversible mapping to strings can be used to produce the required total ordering for balanced search trees, but not the other way around. This can also be problematic if a data type only provides a comparison operation, but not a (de)serialization operation.

Hash tables are commonly said to have expected O(1) insertion and deletion times, but this is only true when considering computation of the hash of the key to be a constant time operation. When hashing the key is taken into account, hash tables have expected O(k) insertion and deletion times, but may take longer in the worst-case depending on how collisions are handled. Radix trees have worst-case O(k) insertion and deletion. The successor/predecessor operations of radix trees are also not implemented by hash tables.

Variants[edit]

A common extension of radix trees uses two colors of nodes, 'black' and 'white'. To check if a given string is stored in the tree, the search starts from the top and follows the edges of the input string until no further progress can be made. If the search-string is consumed and the final node is a black node, the search has failed; if it is white, the search has succeeded. This enables us to add a large range of strings with a common prefix to the tree, using white nodes, then remove a small set of "exceptions" in a space-efficient manner by inserting them using black nodes.

The HAT-trie is a radix tree based cache-conscious data structure that offers efficient string storage and retrieval, and ordered iterations. Performance, with respect to both time and space, is comparable to the cache-conscious hashtable.^[5]^[6] See HAT trie implementation notes at [1]

利用Radix树作为Key-Value 键值对的数据路由

引言：总所周知，NoSQL，Memcached等作为Key—Value 存储的模型的数据路由都采用Hash表来达到目的。如何解决Hash冲突和Hash表大小的设计是一个很头疼的问题。

借助于Radix树，我们同样可以达到对于uint32_t 的数据类型的路由。这个灵感就来自于Linux内核的IP路由表的设计。

作为传统的Hash表，我们把接口简化一下，可以抽象为这么几个接口。

 
             void  
             Hash_create( 
             size_t  
             Max ); 
            
             int  
             Hash_insert( uint32_t hash_value , value_type value ) ; 
            
             value_type *Hash_get( uint32_t hashvalue );<br><br> 
             int   
             Hash_delete( uint32_t hash_value );

接口的含义如其名，创建一个Hash表，插入，取得，删除。

同样，把这个接口的功能抽象后，利用radix同样可以实现相同的接口方式。

 1 int mc_radix_hash_ini(mc_radix_t *t ,int nodenum )；
 2 
 3 int mc_radix_hash_insert( mc_radix_t *t , unsigned int hashvalue , void *data ,size_t size )；
 4 
 5 int mc_radix_hash_del( mc_radix_t *t , unsigned int hashvalue ) ;
 6 
 7 void *mc_radix_hash_get( mc_radix_t *t , unsigned int hashvalue ) ;

那我们简单介绍一下Radix树：

Radix Tree(基树) 其实就差不多是传统的二叉树，只是在寻找方式上，利用比如一个unsigned int 的类型的每一个比特位作为树节点的判断。

可以这样说，比如一个数 1000101010101010010101010010101010 （随便写的）那么按照Radix 树的插入就是在根节点，如果遇到 0 ，就指向左节点，如果遇到1就指向右节点，在插入过程中构造树节点，在删除过程中删除树节点。如果觉得太多的调用Malloc的话，可以采用池化技术，预先分配多个节点，本博文就采用这种方式。

 1 typedef struct _node_t
 2 {
 3     char     zo                ;         // zero or one
 4     int        used_num       ;
 5     struct _node_t *parent ;
 6     struct _node_t *left   ;
 7     struct _node_t *right  ;
 8     void            *data   ;//for nodes array list finding next empty node
 9     int        index           ;
10 }mc_radix_node_t ;

节点的结构定义如上。

zo 可以忽略，父节点，坐指针，右指针顾名思义，data 用于保存数据的指针，index 是作为 node 池的数组的下标。

树的结构定义如下：

 1 ypedef struct _radix_t
 2 {
 3     mc_radix_nodes_array_t * nodes    ;
 4     mc_radix_node_t    *         root      ;
 5 
 6     mc_slab_t        *         slab      ;
 7     
 8     
 9     /*
10     pthread_mutex_t             lock        ;
11     */
12     int                         magic       ;
13     int                         totalnum ;
14     size_t                     pool_nodenum ;
15     
16     mc_item_queue             queue ;
17 }mc_radix_t ;

暂且不用看 nodes 的结构，这里只是作为一个node池的指针

root 指针顾名思义是指向根结构，slab 是作为存放数据时候的内存分配器，如果要使用内存管理来减少开销的话（参见slab内存分配器一章）

magic用来判断是否初始化，totalnum 是叶节点个数，poll_nodenum 是节点池内节点的个数。

queue是作为数据项中数据的队列。

我们采用8421编码的宏来作为每一个二进制位的判断：

 
             #define U01_MASK    0x80000000 
            
             #define U02_MASK    0x40000000 
            
             #define U03_MASK    0x20000000 
            
             #define U04_MASK    0x10000000<br>.<br>.<br>.<br>.

#define U31_MASK 0x00000002
#define U32_MASK 0x00000001

　类似这样的方式来对每一位二进制位做判断，还有其他更好的办法，这里只是作为简化和快速。

 
             unsigned 
             int  
             MASKARRAY[32] = {  
            
             U01_MASK,U02_MASK,U03_MASK,U04_MASK,U05_MASK,U06_MASK,U07_MASK,U08_MASK, 
            
             U09_MASK,U10_MASK,U11_MASK,U12_MASK,U13_MASK,U14_MASK,U15_MASK,U16_MASK, 
            
             U17_MASK,U18_MASK,U19_MASK,U20_MASK,U21_MASK,U22_MASK,U23_MASK,U24_MASK, 
            
             U25_MASK,U26_MASK,U27_MASK,U28_MASK,U29_MASK,U30_MASK,U31_MASK,U32_MASK 
            
             };

我们为Radix 提供了一些静态函数，不对外声明：

初始化节点池

1	`static` `int` `mc_radix_nodes_ini(mc_radix_nodes_array_t *par_nodearray ,` `size_t` `par_maxnum )`

取得一个节点：

1	`static` `mc_radix_node_t mc_get_radix_node(mc_radix_nodes_array_t par_nodearray )`

归还一个节点：

1	`static` `void` `mc_free_radix_node( mc_radix_nodes_array_t par_nodearray , mc_radix_node_t par_free_node )`

　这里是初始化radix 树：

 1 int mc_radix_hash_ini(mc_radix_t *t ,size_t nodenum )
 2 {
 3     /* init the node pool */
 4     t->nodes = (mc_radix_nodes_array_t *)malloc( sizeof(mc_radix_nodes_array_t) ); //为节点池分配空间
 5     t->slab = mc_slab_create();　　　　　　　　　　　　　　　　　　　　　　　　　　　　    //使用slab分配器
 6     mc_radix_nodes_ini( t->nodes , nodenum );　　　　　　　　　　　　　　　　　　　　　　//初始化节点
 7     t->magic = MC_MAGIC ;
 8     t->totalnum = 0 ;
 9     t->pool_nodenum = nodenum ;
10     t->root = NULL ;
11     
12     
13     t->queue.head = NULL ;
14     t->queue.pear = NULL ;
15     t->queue.max_num = nodenum ;
16     t->queue.cur_num = 0 ;
17 }

 1 int mc_radix_hash_insert( mc_radix_t *t , unsigned int hashvalue , void *data ,size_t size )
 2 {
 3     unsigned int i = 0 ;
 4     mc_radix_node_t * root = t->root ;
 5 
 6     if( t->root == NULL )
 7     {
 8         t->root = mc_get_radix_node( t->nodes ) ;
 9     }
10     
11     /* LRU */
12     /*其中涉及到LRU算法，原理是将所有的叶子节点链接为双向队列，然后更新和插入放入队列头，按照一定的比例从队列尾删除数据*/
13     if( t->queue.cur_num >= (t->queue.max_num)*PERCENT )
14     {
15         for( i = 0 ; i < (t->queue.max_num)*(1-PERCENT) ; i++ )
16         {
17             mc_del_item( t , t->queue.pear );
18         }
19     }
20     mc_radix_node_t * cur = t->root ;
21     for(i = 0  ; i < 32 ; i++ )
22     {
23         /* 1 ---> right */
24 　　　　 /*按位来探测树节点*/
25         if( hashvalue & MASKARRAY[i] )
26         {
27             
28             if( cur -> right != NULL )
29             {
30                 cur->used_num++     ;
31                 cur->right->parent = cur ;
32                 cur = cur->right ;                
33             }
34             else
35             {
36                 cur->right = mc_get_radix_node( t->nodes ) ;
37                 if( cur->right == NULL )
38                 {
39                     fprintf(stderr,"mc_get_radix_node error\n");
40                     return -1;
41                 }
42                 cur->used_num++     ;
43                 cur->right->parent = cur ;
44                 cur = cur->right ;
45             }
46         }
47         /* 0 ---> left */
48         else
49         {
50             
51             if( cur->left != NULL )
52             {
53                 cur->used_num++;
54                 cur->left->parent = cur  ;
55                 cur = cur->left ;
56             }
57             else
58             {
59                 cur->left = mc_get_radix_node( t->nodes ) ;
60                 if( cur->left == NULL )
61                 {
62                     fprintf(stderr,"mc_get_radix_node error\n");
63                     return -1;
64                 }
65     
66                 cur->used_num++;
67                 cur->left->parent = cur  ;
68                 cur = cur->left ;
69             }
70         }        
71     }
72     
73     t->totalnum ++ ;
74     mc_slot_t * l_slot = mc_slot_alloc( t->slab, size ) ;
75     cur->data = ( mc_slot_t *)(cur->data);
76     memcpy( l_slot->star , data , size );
77     cur->data = l_slot ;
78     
79     /*add to t->queue */
80     if( t->queue.head == NULL )
81     {
82         t->queue.head = cur ;
83         t->queue.pear = cur ;
84         cur->left = NULL  ;
85         cur->right = NULL ;
86         
87         t->queue.cur_num++ ;
88     }
89     else
90     {
91         cur->left = NULL ;
92         cur->right = t->queue.head ;
93         t->queue.head->left = cur ;
94         t->queue.head = cur ;
95         
96         t->queue.cur_num++ ;
97     }
98     return 1;
99 }

删除一个节点,通过hashvalue作为其value,顾名思义

 1 int mc_radix_hash_del( mc_radix_t *t , unsigned int hashvalue )
 2 {
 3     if( t == NULL || t->root == NULL )
 4     {        
 5         return -1;
 6     }
 7     /* non  initialized */
 8     if( t->magic != MC_MAGIC )
 9     {        
10         return -1;
11     }
12     mc_radix_node_t * cur = t->root ;    
13     mc_radix_node_t * cur_par ;
14     int    i = 0 ;
15     for( ; i < 32 ; i++ )
16     {
17         if( hashvalue & MASKARRAY[i] )
18         {
19             
20             if( cur->right != NULL )
21             {
22                 cur->used_num--  ;
23                 cur = cur->right ;
24             }
25             else
26                 return -1;
27         }
28         else
29         {
30         
31             if( cur->left != NULL )
32             {
33                 cur->used_num-- ;
34                 cur = cur->left ;
35             }
36             else
37                 return -1;
38         }
39     }
40     
41     if( cur->used_num >= 0 )
42         mc_slot_free(cur->data);
43     
44     /*remove from t->queue */
45     if( cur == t->queue.pear && cur == t->queue.head )
46     {
47         t->queue.pear = NULL ;
48         t->queue.head = NULL ;
49         t->queue.cur_num -- ;
50     }
51     /* the last item */
52     else if( cur == t->queue.pear && cur != t->queue.head)
53     {
54         cur->left->right = NULL  ;
55         cur->left = NULL  ;
56         t->queue.cur_num -- ;
57     }
58     else if( cur != t->queue.pear )
59     {
60         cur->left->right = cur->right ;
61         cur->right->left = cur->left ;
62         t->queue.cur_num -- ;
63     }
64     else
65     {
66         cur->left->right = cur->right ;
67         cur->right->left = cur->left ;
68         t->queue.cur_num -- ;
69     }
70         
71     for(;;)
72     {
73         
74         if( cur->used_num == 0 )
75         {
76             cur_par = cur->parent ;
77             mc_free_radix_node( t->nodes , cur );
78             cur = cur_par ;
79         }
80         if( cur == NULL )
81             break ;
82         if( cur->used_num > 0  )
83             break ;
84             
85     }
86     
87     return 1;
88     
89 }

取得值：通过void * 指向

 1 void *mc_radix_hash_get( mc_radix_t *t , unsigned int hashvalue )
 2 {
 3     if( t == NULL || t->root == NULL )
 4     {        
 5         fprintf(stderr,"t == NULL || t->root == NULL\n");
 6         return (void *)(0);
 7     }
 8     /* non  initialized */
 9     if( t->magic != MC_MAGIC )
10     {        
11         fprintf(stderr,"t->magic != MC_MAGIC\n");
12         return (void *)(0);
13     }
14     mc_radix_node_t * cur = t->root ;    
15     mc_slot_t *ret_slot ;
16     int i = 0 ; 
17     for( ; i < 32 ; i++ )
18     {
19         if( hashvalue & MASKARRAY[i] )
20         {
21             if( cur->right == NULL )
22                 break;
23             else
24                 cur = cur->right ;
25         }
26         else
27         {
28             if( cur->left == NULL )
29                 break;
30             else
31                 cur = cur->left ;
32         }
33     }
34     if( i == 32 )
35     {
36         ret_slot = cur->data;
37         
38         /* update LRU queue*/
39         if( cur->left != NULL )
40         {
41             if( cur->right != NULL )
42             {
43                     cur->left->right = cur->right ;
44                     cur->right->left = cur->left ;
45                     cur->left = t->queue.head ;
46                     t->queue.head->left = cur ;
47                     t->queue.head = cur ;
48             }
49             else
50             {
51                 /* cur->right == NULL  last element of LRU queue */
52                     cur->left->right = NULL ;
53                     cur->left = t->queue.head ;
54                     t->queue.head->left = cur ;
55                     t->queue.head = cur ;
56                     
57             }
58         }
59         return (void *)(ret_slot->star) ;
60     }
61     else
62     {
63         fprintf(stderr,"i = %d \n",i);
64         return (void *)(0);
65     }
66 }

 1 int mc_free_radix( mc_radix_t *t )
 2 {
 3     mc_free_all_radix_node(t->nodes);
 4     mc_slab_free(t->slab);
 5     free(t->nodes);
 6 }
 7 
 8 static void mc_del_item( mc_radix_t *t ,  mc_radix_node_t * cur )
 9 {
10     if( cur->left == NULL )
11     {
12         fprintf(stderr,"item number in LRU queue is too small \n");
13         return ;
14     }
15     if( cur->right != NULL )
16     {
17         fprintf(stderr,"cur should be the last of LRU queue \n");
18     }
19     /* remove from LRU queue */
20     mc_radix_node_t * pcur = cur->left ;
21     cur->left = NULL   ;
22     pcur->right = NULL ;
23     
24     pcur = cur->parent ;
25     /* remove from radix tree */
26     while( pcur != NULL )
27     {
28         cur->used_num -- ;
29         if( cur->used_num <=0 )
30         {
31             mc_free_radix_node( t->nodes , cur );
32         }
33         cur = pcur ;
34         pcur = pcur->parent ;
35     } 
36     
37 }

总结：radix 树作为key-value 路由最大的好处就是在于减少了hash表的动态和一部分碰撞问题等。还可以在此结构上方便的扩展 LRU算法，淘汰数据等。

如果担心node 的初始化和申请太过于浪费资源，可以采用节点池的方式设计。

文章属原创，转载请注明出处联系作者： Email:zhangbo1@ijinshan.com QQ:51336447

Nginx源代码分析-radix tree

5人收藏此文章, 我要收藏发表于4个月前(2013-03-03 23:05) , 已有 204次阅读，共 0个评论

本文分析基于Nginx-1.2.6，与旧版本或将来版本可能有些许出入，但应该差别不大，可做参考

radix tree是一种字典树，可以很得心应手地构建关联数组。在信息检索中可用于生成文档的倒排索引，另外，在IP路由选择中也有其特别的用处。

在Nginx中实现了radix tree，其主要用在GEO模块中，这个模块中只有一个指令即geo，通过这个指令可以定义变量，而变量的值依赖于客户端的IP地址（默认使用($remote_addr，但也可设定为其他变量），通过这个模块可以实现负载均衡，对不同区段的用户请求使用不同的后端服务器。一个例子：

 geo  $country  {
   default          no; 
   127.0.0.0/24     us;    #/之前为IP地址address，/之后是地址掩码mask
   127.0.0.1/32     ru;
   10.1.0.0/16      ru;
   192.168.1.0/24   uk;    #当ip地址为192.168.1.23时，变量country的值为uk
 }

nginx在解析上面这段配置时，会构建一个数据结构，并在接受请求后根据客户端IP地址查找对应的变量值，这个数据结构就是radix tree，它是一棵二叉树，其结构图如下所示，每条边对应1bit是0或1。 ![radix tree][1]

 
       01 typedef struct ngx_radix_node_s  ngx_radix_node_t;
 
       02  
 
       03 struct ngx_radix_node_s {
 
       04     ngx_radix_node_t  *right;
 
       05     ngx_radix_node_t  *left;
 
       06     ngx_radix_node_t  *parent;
 
       07     uintptr_t          value;
 
       08 };
 
       09  
 
       10 typedef struct {
 
       11     ngx_radix_node_t  *root;
 
       12     ngx_pool_t        *pool;
 
       13     ngx_radix_node_t  *free;
 
       14     char              *start;
 
       15     size_t             size;
 
       16 } ngx_radix_tree_t;

为避免频繁地为ngx_radix_node_t分配和释放空间，实现节点的复用，ngx_radix32tree_delete删除节点后并没有释放空间，而是利用ngx_radix_tree_t中的成员free把删除的节点连接成了一个单链表结构，在调用ngx_radix_alloc创建新节点时就先看free右孩子指针所指向的链表是否为空，如果不为空，就从中取出一个节点返回其地址。另外，为radix tree分配空间是以Page为单位的，start指向Page中可用内存的起始位置，size是page中剩余可用的空间大小。

radix tree的创建、插入一节点、删除一节点、查找这四个操作的函数声明如下：

 
       1 ngx_radix_tree_t *ngx_radix_tree_create(ngx_pool_t *pool,
 
       2     ngx_int_t preallocate);
 
       3 ngx_int_t ngx_radix32tree_insert(ngx_radix_tree_t *tree,
 
       4     uint32_t key, uint32_t mask, uintptr_t value);
 
       5 ngx_int_t ngx_radix32tree_delete(ngx_radix_tree_t *tree,
 
       6     uint32_t key, uint32_t mask);
 
       7 uintptr_t ngx_radix32tree_find(ngx_radix_tree_t *tree, uint32_t key);

插入节点

geo指令中的“192.168.1.0/24 ru;”这样一条配置就对应了radix tree中的一个节点，那程序中是如何实现的呢？首先看函数ngx_radix32tree_insert中的参数，key是对应inaddrt类型的ip地址转换成主机字节序后的四个字节，mask即网络掩码，对应于24的是0xFFFFFF00四个字节，value是对应ru的一个 ngx_http_variable_value_t类型的指针。

将value插入那个位置呢？从key&mask的最高位开始，若是0，则转向左孩子节点，否则转向右孩子节点，以此类推沿着树的根节点找到要插入的位置（对应上面例子的要插入的节点在第24层）。若到了叶子节点仍没到达最终位置，那么在叶子节点和最终位置之间空缺的位置上插入value=NGX_RADIX_NO_VALUE的节点。如果对应位置已经有值，返回NGX_BUSY，否则设置对应的value，返回NGX_OK。

创建

为radix tree树结构及其root节点分配空间，并根据preallocate的值向树中插入一定数量的节点，当preallocate等于-1时，会重新为preallocate设置适当的值，不同平台下会插入不同数量的节点。

preallocate的具体含义是，在树中插入第1层到第preallocate层所有的节点，即创建树之后树中共有2^(preallocate+1)-1个节点。那么，当preallocate=-1时，应该为不同的平台设定怎样的值呢？这是由num=ngx_pagesize/sizeof(ngx_radix_node_t)决定的，当为num=128时，preallocate=6，这是因为预先插入节点生成的树是完全二叉树，树的第6层节点都插满时，树共有127个节点占用正好不大于1页内存的空间，增加preallocate继续预先插入节点就会得不偿失。这里我也说不太清楚，贴上注释：

 
       01 /*
 
       02  * Preallocation of first nodes : 0, 1, 00, 01, 10, 11, 000, 001, etc.
 
       03  * increases TLB hits even if for first lookup iterations.
 
       04  * On 32-bit platforms the 7 preallocated bits takes continuous 4K,
 
       05  * 8 - 8K, 9 - 16K, etc.  On 64-bit platforms the 6 preallocated bits
 
       06  * takes continuous 4K, 7 - 8K, 8 - 16K, etc.  There is no sense to
 
       07  * to preallocate more than one page, because further preallocation
 
       08  * distributes the only bit per page.  Instead, a random insertion
 
       09  * may distribute several bits per page.
 
       10  *
 
       11  * Thus, by default we preallocate maximum
 
       12  *     6 bits on amd64 (64-bit platform and 4K pages)
 
       13  *     7 bits on i386 (32-bit platform and 4K pages)
 
       14  *     7 bits on sparc64 in 64-bit mode (8K pages)
 
       15  *     8 bits on sparc64 in 32-bit mode (8K pages)
 
       16  */

查找

现在给定一个ip，应该在radix tree中怎样找到对应的变量值呢？首先将ip地址转换成主机字节序的四个字节，然后调用uintptr_t ngx_radix32tree_find即可，在这个函数中，会将从32位的key的最高位开始，若是0，就转向左孩子，若是1，就转向右孩子，这样从树的根节点开始，直到找到对应的叶子节点为止，在此查找路径上最后一个值不为NGX_RADIX_NO_VALUE的node的value就是所返回的值。代码如下：

 
       01 uintptr_t
 
       02 ngx_radix32tree_find(ngx_radix_tree_t *tree, uint32_t key)
 
       03 {
 
       04     uint32_t           bit;
 
       05     uintptr_t          value;
 
       06     ngx_radix_node_t  *node;
 
       07  
 
       08     bit = 0x80000000;
 
       09     value = NGX_RADIX_NO_VALUE;
 
       10     node = tree->root;
 
       11  
 
       12     while (node) {
 
       13        if (node->value != NGX_RADIX_NO_VALUE) {
 
       14             value = node->value;
 
       15        }
 
       16  
 
       17         if (key & bit) {
 
       18             node = node->right;
 
       19  
 
       20         } else {
 
       21             node = node->left;
 
       22         }
 
       23  
 
       24         bit >>= 1;
 
       25     } 
 
       26  
 
       27     return value;
 
       28  }

删除节点

删除过程，首先要先找到要删除的节点，其过程同插入一节点时相同，如果找不到，返回NGX_ERROR，否则就分两种情况：

如果要删除的节点是叶子节点，那么将此节点删除，并插入到free右孩子指针所指向的链表中，留在以后复用，如果删除之后，其父节点成了叶子节点且其值为NGX_RADIX_NO_VALUE，那么也将其父节点执行同样的删除操作，以此类推直到根节点为止；
如果要删除的节点有至少一个孩子，并且这个要删除的节点的值不是NGX_RADIX_NO_VALUE，则只需设定其值为NGX_RADIX_NO_VALUE即可，这样子处理，减少了删除操作的复杂度，这个节点也只有等遇到第一种情况时才会真正地从树中删除。

hash_map vs radix tree

August 30th, 2011 绚丽也尘埃 Leave a comment Go to comments

最近看代码看到有一个radix tree的应用。引擎对数据建索引时，需要建立字段名到字段序号的映射表，这个表使用非常频繁。比如有6亿document，每个document有100个字段，很多字段字段会同时建index，profile和detail索引，所以需要在表中查找三遍，因此至少需要查找600亿次。如果能提高这些查找的效率，程序的整体效率会得到提高。

写了个小程序对比了下hash_map（Linux平台下的实现）和radix tree的效率。理论上来讲radix tree效率会提高不少，查找一个字符串需要O(n)，hash_map需要对字符串求hash值，至少要将字符串遍历一遍，另外还要有一些多余的加减乘除。另外一个影响因素是用C写的代码比较紧凑，使用inline声明比较容易被内联，而hash_map必须使用一个hash函数对象，其本身的代码也比较复杂不容易被内联。radix tree的主要缺点是每个节点的指针数组如果做成动态分配，代码写起来会比较麻烦。析构一个radix tree也比较麻烦。Linux内核也用到了radix tree，没看过代码，应该做的很精致吧。

下面这个例子程序在一台8核8GB内存的RHEL4服务器上运行，hash_map和radix_tree插入相同的16个节点，然后查找1亿次。O2优化后，hash_map运行25s左右，radix tree则只要2s左右，效率提升非常明显。顺便比较了下map，map的查找性能是最差的，要34s。如果要查找600亿次，hash_map需要240分钟，如果分到20个机器上，每个机器起6个线程上，每个线程要花上将近2分钟。可以考虑用oprofile来统计下现在建一次索引花在radix tree查找上的时间。例子程序代码如下。

 
           #include <iostream> 
          
           #include <map> 
          
           #include <ext/hash_map> 
          
           using 
           namespace 
            std;                                                                                  
          
           using 
           namespace 
            __gnu_cxx; 
          
           namespace 
           __gnu_cxx 
          
           { 
          
           template 
           <> 
           struct 
           hash< std::string > 
          
           { 
          
           size_t 
           operator()(  
           const 
           std::string& x )  
           const 
          
           { 
          
           return 
           hash<  
           const 
           char 
           * >()( x.c_str() ); 
          
           } 
          
           }; 
          
           } 
          
           #define RADIX_NUM 256 
          
           struct 
           radix_node_t 
          
           { 
          
           radix_node_t* p[RADIX_NUM]; 
          
           int 
           index; 
          
           bool 
           is_final; 
          
           //radix_node_t() {cout << "new a node" << endl;} 
          
           //~radix_node_t() {cout << "destroy a node" << endl;} 
          
           }; 
          
           void 
           radix_init(radix_node_t* &radix) 
          
           { 
          
           radix =  
           new 
           radix_node_t; 
          
           memset 
           (radix, 0,  
           sizeof 
           (radix_node_t)); 
          
           } 
          
           void 
           radix_insert(radix_node_t* radix,  
           char 
           * str,  
           int 
           index) 
          
           { 
          
           char 
           * ptr = str; 
          
           radix_node_t* radix_iter = radix; 
          
           while 
           (*ptr) 
          
           { 
          
           //cout << *ptr << endl; 
          
           if 
           (radix_iter->p[*ptr] == NULL) 
          
           { 
          
           // 
          
           radix_node_t* new_radix =  
           new 
           radix_node_t; 
          
           memset 
           (new_radix, 0,  
           sizeof 
           (radix_node_t)); 
          
           radix_iter->p[*ptr] = new_radix; 
          
           } 
          
           radix_iter = radix_iter->p[*ptr]; 
          
           ++ptr; 
          
           } 
          
           radix_iter->index=index; 
          
           radix_iter->is_final= 
           true 
           ;   
          
           } 
          
           inline 
           int 
            radix_find(radix_node_t* radix,  
           char 
           * str) 
          
           { 
          
           radix_node_t* radix_iter = radix; 
          
           char 
           * ptr = str; 
          
           while 
           (*ptr) 
          
           { 
          
           if 
           (radix_iter->p[*ptr] == NULL) 
          
           { 
          
           return 
           -1; 
          
           } 
          
           radix_iter = radix_iter->p[*ptr]; 
          
           ++ptr; 
          
           } 
          
           if 
           (radix_iter->is_final ==  
           true 
           ) 
          
           { 
          
           return 
           radix_iter->index; 
          
           } 
          
           } 
          
           bool 
           radix_destroy(radix_node_t* radix) 
          
           { 
          
           radix_node_t* radix_iter = radix; 
          
           for 
           ( 
           int 
           i=0; i<RADIX_NUM; ++i) 
          
           { 
          
           if 
           (radix_iter->p[i] == NULL) 
          
           { 
          
           continue 
           ; 
          
           } 
          
           //the leaf node 
          
           if 
           (radix_iter->p[i]->is_final ==  
           true 
           ) 
          
           { 
          
           delete 
           radix_iter->p[i]; 
          
           } 
          
           else 
          
           { 
          
           radix_destroy(radix_iter->p[i]); 
          
           } 
          
           } 
          
           delete 
           radix_iter;   
          
           } 
          
           #define FIND_COUNT 100000000 
          
           //#define FIND_COUNT 6 
          
           #define FIELDS_NUM 16 
          
           #define FIELD_LEN 20 
          
           int 
           main( 
           int 
           argc,  
           const 
           char 
            *argv[]) 
          
           { 
          
           char 
           fields[FIELDS_NUM][FIELD_LEN]={{ 
           "nid" 
           }, { 
           "user_id" 
           }, { 
           "post_fee" 
           }, { 
           "title" 
           }, { 
           "nick" 
           }, { 
           "price" 
           }, { 
           "pict_url" 
           }, { 
           "provcity" 
           }, { 
           "auction_type" 
           }, { 
           "auction_flag" 
           }, { 
           "quantity" 
           }, { 
           "isprepay" 
           }, { 
           "pidvid" 
           }, { 
           "spuid" 
           }, { 
           "promoted_service" 
           }, { 
           "counts" 
           }}; 
          
           hash_map<string, 
           int 
           > fields_hash_map; 
          
           map<string, 
           int 
           > fields_map; 
          
           int 
           begin, end; 
          
           //head node 
          
           radix_node_t* radix; 
          
           for 
           ( 
           int 
           i=0; i<FIELDS_NUM; ++i) 
          
           { 
          
           fields_hash_map[fields[i]] = i; 
          
           fields_map[fields[i]] = i; 
          
           } 
          
           begin =  
           time 
           (NULL); 
          
           for 
           ( 
           int 
           i=0; i<FIND_COUNT; ++i) 
          
           { 
          
           volatile 
           int 
            index = fields_hash_map[fields[i % FIELDS_NUM]]; 
          
           } 
          
           end =  
           time 
           (NULL); 
          
           cout <<  
           "hash_map time: " 
            << end - begin << endl; 
          
           //=================================================== 
          
           begin =  
           time 
           (NULL); 
          
           for 
           ( 
           int 
           i=0; i<FIND_COUNT; ++i) 
          
           { 
          
           volatile 
           int 
            index = fields_map[fields[i % FIELDS_NUM]]; 
          
           } 
          
           end =  
           time 
           (NULL); 
          
           cout <<  
           "map time: " 
            << end - begin << endl; 
          
           //=================================================== 
          
           radix_init(radix); 
          
           for 
           ( 
           int 
           i=0; i<6; ++i) 
          
           { 
          
           char 
           * ptr = fields[i]; 
          
           radix_insert(radix, ptr, i); 
          
           } 
          
           char 
           * str =  
           "abc" 
           ; 
          
           begin =  
           time 
           (NULL); 
          
           for 
           ( 
           int 
           i=0; i<FIND_COUNT; ++i) 
          
           { 
          
           char 
           * str = fields[i % FIELDS_NUM]; 
          
           volatile 
           int 
            index = radix_find(radix, str); 
          
           //cout << index << endl; 
          
           } 
          
           end =  
           time 
           (NULL); 
          
           cout <<  
           "radix tree time: " 
            << end - begin << endl; 
          
           radix_destroy(radix); 
          
           return 
           0; 
          
           }

Radix Tree 算法

Posted by chenyajun in Nginx, 数据结构与算法 |

Nginx 中有一个模块：geo，它可以针对不同的 IP 地址来定义不同的变量值，其中就用到了 radix tree 和 red-black tree。

Radix Tree
实质就是 trie 数组的一种变体，但是不同的是其中的边不像 trie 那样只存放一个字符，而是可以存放多个字符。这很有利于路径的压缩，可以有效减小树的深度。radix tree 已经被应用在 bsd 的路由查找和 linux 内核之中。

算法实现
维基百科上的文章很清楚描述了 radix tree 大致是怎么一回事。

复杂度

Linux 基数树（ radix tree ）是将指针与 long 整数键值相关联的机制，它存储有效率，并且可快速查询，用于指针与整数值的映射（如： IDR 机制）、内存管理等。

IDR（ID Radix）机制是将对象的身份鉴别号整数值ID与对象指针建立关联表，完成从ID与指针之间的相互转换。IDR机制使用radix树状结构作为由id进行索引获取指针的稀疏数组，通过使用位图可以快速分配新的ID，IDR机制避免了使用固定尺寸的数组存放指针。IDR机制的API函数在lib/idr.c中实现，这里不加分析。

Linux radix树最广泛的用途是用于内存管理，结构address_space通过radix树跟踪绑定到地址映射上的核心页，该radix树允许内存管理代码快速查找标识为dirty或writeback的页。Linux radix树的API函数在lib/radix-tree.c中实现。

radix 树概述

radix树是通用的字典类型数据结构，radix树又称为PAT位树（Patricia Trie or crit bit tree）。Linux内核使用了数据类型unsigned long的固定长度输入的版本。每级代表了输入空间固定位数。

radix tree是一种多叉搜索树，树的叶子结点是实际的数据条目。每个结点有一个固定的、2^n指针指向子结点（每个指针称为槽slot），并有一个指针指向父结点。

Linux内核利用radix树在文件内偏移快速定位文件缓存页，图4是一个radix树样例，该radix树的分叉为4(22)，树高为4，树的每个叶子结点用来快速定位8位文件内偏移，可以定位4x4x4x4=256页，如：图中虚线对应的两个叶子结点的路径组成值0x00000010和0x11111010，指向文件内相应偏移所对应的缓存页。

图4 一个四叉radix树

Linux radix树每个结点有64个slot，与数据类型long的位数相同，图1显示了一个有3级结点的radix树，每个数据条目（item）可用3个6位的键值（key）进行索引，键值从左到右分别代表第1~3层结点位置。没有孩子的结点在图中不出现。因此，radix树为稀疏树提供了有效的存储，代替固定尺寸数组提供了键值到指针的快速查找。