

In order to handle spatial data efficiently, as required in commuter aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations. However, traditional indexing methods are not well suited to data objects of non-zero size located in multi-dimensional spaces, in this paper we describe a dynamic index structure called an R-tree which meets the need, and give algorithms for searching and updating it, We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems in spatial applications.


1 Introduction

Spatial data objects often cover areas in multi-dimensional spaces and are not well represented by point locations. For example, map objects like counties, census tracts etc occupy regions of non-zero size in two dimensions. A common operation on spatial data is a search for all objects in an area, for example to find all countries that have land within 20 miles of a particular point. This kind of spatial search occurs frequently in computer aided design (CAD) and geo-data applications, and therefore it is important to be able to retrieve objects efficiently according to their spatial location.


An index based on object’s spatial locations is desirable, but classical one-dimensional database indexing structures are not appropriate to multi-dimensional spatial searching. Structures based on exact matching of values, such hash tables, are not useful because a range search is required. Structures using one-dimensional ordering of key values, such as B-trees and ISAM indexes, do not work because the search space is multi-dimensional.


A number of structures have been proposed for handling multi-dimensional point data, and a survey of methods can be found in [5]. Cell methods [4, 8, 16] are not good for dynamic structures because the cell boundaries must be decided in advance. Quad trees [7] and k-d trees do not take paging of secondary memory into account. K_D_B trees [13] are designed for paged memory but are useful only for point data. The use of index inter-values has been suggested in [15], but this method cannot be used in multiple dimensions. Corner stitching [12] is an example of a structure for two-dimensional spatial searching suitable for data objects of non-zero size, but is assumes homogeneous primary memory and is not efficient for random searched in very large collections of data. Grid files [10] handle non-point data by mapping each object to a point in a higher-dimensional space. In this paper we describe an alternative structure called an R-tree which represents data objects by intervals in several dimensional.

       Section 2 outlines the structure of an R-tree and Section 3 gives algorithms for searching, inserting, deleting, and updating. Results of R-tree index performance tests are presented in Sections 4. Section 5 contains a summary of our conclusions.

一些结构用来处理多维点数据以及相关的方法在论文【5】中可以看到,其核心算法对于动态结构也不是很好用,应为核心边界必须实现计算出来(没考证)。Quad-treesk-D trees没有考虑块内存。K-D-B trees 解决了块内存的问题,但也局限于点数据。内值索引的适用在论文【15】中被提到,但是这种方法也不能再多维空间中适用。Corner stitching结构可以用来处理多维空间中非零数据的空间查询,但其假定是在同类内存块中并且在这种结构在大量数据集合中的随机查询的效率也不是很高。Grid files的做法是将非点数据与一个点进行映射。在本篇论文中我们将描述一个R树动态结构用多维间隔的方式来存储数据。


2 R-tree Index Structure

       An R-tree is a height-balanced tree similar to a B-tree [2, 6] with index records in its leaf nodes containing points to data objects. Nodes correspond to disk pages if the index is disk-resident, and the structure is designed so that a spatial search requires visiting only a small number of nodes. The index is completely dynamic; inserts and deletes can be inter-mixed with searches and no periodic reorganization is required.


       A spatial database consists of a collection of tuples representing spatial objects, and each tuple has a unique identifier which can be used to retrieve it. Leaf nodes in an R-tree contain index record entries of the form (I, tuple-identifier) where tuple-identifier refers to a tuple in the database and I is an n-dimensional rectangle which is the bounding box of the spatial object indexed I = (I0, I1, In-1). Here n is the number of dimensions and Ii is a closed bounded interval [a, b] describing the entries of the object along dimensional i. Alternatively Ii may have one or both endpoints equal to infinity, indicating that the object extends outward indefinitely. Non-leaf nodes contain entries if the form (I, child-pointer) where child-pointer is the address of a lower node in the R-tree and I covers all rectangles in the lower node’s entries.

       空间数据库是空间对象元组的集合,每一个元组都有一个确定的标识符,通过这个标识符就可以很容易得到这个元组。在R树中,页结点包含着一个索引记录,这个索引记录的形式为(I, tuple-identifier)tuple-identifier是数据集中的一个元组的索引,I是一个N维的、空间对象索引I = (I0, I1, In-1)的范围矩形,而li代表第i维的范围[a, b]。如果对于(I, child-pointer)中,child-pointer是低一层结点的话,非叶结点则存储着条目,并且存储着低一层的所有节点的外接矩形。

       Let M be the maximum number of entries that will fit in one node let mM/2 be a parameter specifying the minimum number of entries in a node. An R-tree satisfies the following properties.


(1)   Every leaf node contains between m and M index records unless it is the root

(2)   For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple.

(3)   Every non-leaf node has between m and M children unless it is the root

(4)   For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node.

(5)   The root node has at least two children unless it is a leaf.

(6)   All leaves appear on the same level.


(1)   每个叶结点必须包含mM个索引记录除非它是根

(2)   对于叶结点的每个索引记录(I, tuple-identifier)I 包含了一个最小矩形,这个矩形是包含了元组所对应的空间对象

(3)   每个  叶结点要包含mM个孩子除非它是根

(4)   对于非叶结点中每一个条目(I, child-pointer) I是能够包含孩子结点矩形的矩形

(5)   根结点最少要有两个孩子除非这个根结点就是一个叶结点

(6)   所有叶结点必须在同一层中

图2 1a和2 1b中展示了R树的结构并且说明了其中矩形的内容以及覆盖关系。包含N个索引记录的R树的高度最多为|logmN|-1,这是因为每个节点分支的最小界限是m.故而节点数目的最大值是|N/m|+|N/m*m|+...+1.

3. Searching and Updating


3.1 Searching

The search algorithm descends the tree from the root in a manner similar to a B-tree. However, more than one subtree under a node visited may need to be searched; hence it is not possible to guarantee good worst-case performance. Nevertheless will most kinds of data the update algorithm will maintain the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space, and examine only data near the Algorithms Search. Given an R-tree whose root node is T, find all index records whose rectangles overlap a search rectangles S



Search area

         In the following we denote the rectangle part of an index entry E by EI, and the tuple-identifier or child-pointer part by Ep.

(1)        [Search subtrees]

If T is not a leaf, check each entry E to determine whether E I overlaps S. For all overlapping entries, invoke Search on the tree whose root node is pointed to by E p

(2)       [Search leaf node]

If T is a leaf, check all entries E to determine whether E I overlaps S. If so, E is a qualifying record.


         接下来,我们将记录E的矩形部分记为EI,将tuple-identifierchild-pointer记为E p

(1)       【在子树中查询】

如果T不是叶子,查询每条记录来判断E I是否覆盖S,对于所有覆盖记录,调用查询算法来查询谁的根结点由E p指向。

(2)       【在叶结点中查询】


3.2 Insertion

Insertion index records for new data tuples is similar to insertion in a B-tree in that new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree.


Algorithm Insert:  Insert a new index entry E into an R-tree


(1)        [Insert position for new record]

Invoke ChooseLeaf to select a leaf node L in which to place E

(2)       [Add record to leaf node]

If L has room for another entry, install R. Otherwise invoke SplitNode to obtain L and LLcontaining E and all the old entries of L.

(3)       [Propagate changes upward]

Invoke AdjustTree on L, also passing LL if a split was performed.

(4)       [Grow tree taller]

If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

(1)       【为新记录找到位置】


(2)       【将记录插入叶结点中】


(3)       【增殖向上传递】


(4)       【增加深度】



Algorithm ChooseLeaf: Select a leaf node in which to place a new index entry E.


(1)       [Initialize] Set N to be the root node

(2)       [Leaf check] if N is a leaf, return N

(3)       [Choose subtree] If N is not a leaf, let be the entry in N whose rectangle F I needs least enlargement to include E I. Resolve ties by choosing the entry with the rectangle of smallest area.

(4)       [Descend until a leaf is reached.] Set N to be the child node pointed to by F p and repeat from CL2

(1)       【初始化】将N设置为根结点

(2)       【叶子检查】如果N是叶子,则返回

(3)       【选择子树】如果N不是叶子,设定一个FN的一个目录,它的矩形F I 是将E I包含的最小包围矩形。而正是通过选取矩形的最小区域来解决这个关系。

(4)       【从根下降直到到达叶子】将N设置为p指向的孩子结点,然后回到步骤二

Algorithm AdjustTree: Ascend form a leaf node L to the root, adjusting covering rectangles and propagating node splits as necessary.

算法AdjustTree: 在从叶结点L上升到根节点的过程中,不断调整覆盖矩形,如果需要的话分裂节点进行增殖。

(1)       [Initialize] Set N = L If L was split previously, set NN to be the resulting second node

(2)       [Check if done] If N is the root, stop

(3)       [Adjust covering rectangle in parent entry] Let P be the parent node of N, and let En be N’s entry in P. Adjust EI so that it tightly encloses all entry rectangles in N.

(4)       [Propagate node ] If has a partner NN resulting from an earlier split, create a new entry ENNwith ENN P pointing to NN and ENN I enclosing all rectangles in NN. Add ENN to P if there is room. Otherwise, invoke SplitNode ENN and all P’s old entries.

(5)       [Move up to next level] Set N=P and set NN=PP if a split occurred. Repeat from (2).


(1)       【初始化】


(2)       【检查是否完成】


(3)       【在父亲记录中调整覆盖矩形】


(4)       【节点增殖】

如果因为分裂的原因,N有一个兄弟节点NN,创建一个新的记录ENN(用ENN P指向NN)并最小包围NN的矩形。如果P有空位的话则将ENN插入进去,如P节点满的话,则将ENN和所有P的旧记录进行节点分裂

(5)       【向下一层移动】


Node Splitting

         In order to add a new entry to a full node containing M entries, it is necessary to divide the collection of M+1 entries between two nodes. The division should be done in a way that makes it as unlikely as possible that both new nodes will need to be examined on subsequent searches. Since the decision whether to visit a node depends on whether its covering rectangle overlaps the search area, the total area of the two covering rectangles after a split should be minimized. Figure 3.1 illustrates this point. The area of the covering rectangles in the “bad split” case is much larger than in the “good split” case.

        The same criterion was used in procedure ChooseLeaf to decide where to insert a new index entry at each level in the tree, the subtree chosen was the one whose covering rectangle would have to be enlarged least.

         We now turn to algorithms for partitioning the set of M + 1 entries into two groups, one for each new node.


         为了向满节点插入新节点,所以需要将M+1个节点分开到两个节点对于节点分裂要尽量采取不会在接下来的查询时再将两个节点都查询一次。对于决定是否去访问一个节点取决与它的范围矩形是否覆盖到了要查询的区域,所以两个节点的覆盖矩形的覆盖区域要保证最小化,根据图3.1 说明,差的分裂方法比好的分裂方法形成的范围矩形浪费的空间多很多。



3.5.2 A Quadratic-Cost Algorithm

  This algorithm attempts to find a small-area split, but is not guaranteed to find one with the smallest area possible. The cost is quadratic in M and liner in the number of dimensions. The algorithm picks two of the M+1 entries to be the first elements of the two new groups by choosing the pair that would waste the most area if both were put in the same group, i.e. the area of a rectangle covering both entries, minus the areas of the entries themselves, would be greatest. The remaining entries are then assigned to groups one at a time. At each step the area expansion required to add each remaining entry to each group is calculated, and the entry assigned is the one showing the greatest difference between the two groups.


Algorithm Quadratic Split. Divide a set of M+1 index entries into two groups.

(1)     [Pick first entry for each group] Apply Algorithm PickSeeds to choose two entries to be the first elements for the groups. Assign each to a group.

(2)     [Check if done] If all entries have been assigned, stop. If one group has so few entries that all the rest must be assigned to it in order for it to have the minimum number m, assign them and stop.

(3)     [Select entry to assign] Invoke Algorithm PickNext to choose the next entry to assign. Add it to the group whose covering rectangle will have to be enlarged least to accommodate it. Resolve ties by adding the entry to the group with smaller area, then to the one with fewer entries, then to either. Repeat form (2)


(1)     【为每一组选择第一个记录】调用PickSeeds算法为两组选择两个记录

(2)     【检查是否完成】如果所有的记录都被处理完,则结束。如果一个组有很少的记录从而导致剩下的都需要被分配到这个组中,因为要符合m的最小值,分配后停止

(3)     【选择下次要分配的记录】调用PickNext算法选择下一个要分配的记录。将其加到一个组中(此组加进记录后范围矩形扩张的最小),通过不断选择区域可能增加比较少的组来满足R树所需要的关系,接着再选择较少记录的组来添加,不断重复,(2)


Algorithm PickSeeds: Select two entries to be the first elements of the (two) groups


(1)     [Calculate inefficiency of grouping entries together] For each pair of entries E1 and E2compares a rectangle including E1and E2I. Calculate d = area(J) – area(E1I) – area(E2I)

(2)     [Choose the most wasteful pair] Choose the pair with the largest d.


(1)     【计算无效性】对于一个包含E1 E2I的矩形J,计算剩余面积d = area(J) – area(E1I) – area(E2I)

(2)     【选择最耗费的一对】选择d最大的一对


Algorithm PickNext: Select one remaining entry for classification in a group


(1)     [Determine cost of putting each entry in each group] For each entry E not yet in a group, calculate d1 = the area increase required in the covering rectangle of Group 1 to include E I. Calculate d2 similarly for Group 2.

(2)     [Find entry with greatest preference for one group] Choose any entry with the maximum difference between d1 and d2


(1)     【测算每一个记录放置到组中的面积】对于每个未放置到组中的记录,计算当每个组加入当前记录后的数值为d1,d2

(2)     【为每个组选择最适宜的记录】 选择在d1d2之间的最大的记录


3.5.2 A Liner-Cost Algorithm

         This algorithm is liner in M and in the number of dimensions Linear Split is identical to Quadratic Split, but uses a different version of PickSeeds. PickNext simply chooses any of the remaining entries.



         Algorithm LinerPickSeeds: Select two entries to be the first elements of the groups


(1)     [Find extreme rectangles along all dimensions] Along each dimension, find the entry whose rectangle has the highest low side, and the one with the lowest high side. Record the separation.

(2)     [Adjust for shape of the rectangle cluster] Normalize the separations by dividing by the width of the entire set along the corresponding dimension

(3)     [Select the most extreme pair] Choose the pair with the greatest normalized separation along any dimension.


(1)     【从所有维度上找寻不寻常的矩形】在每一维度上,找出在当前维度上每个记录的范围矩形中,谁的宽最长,谁的长最短。并记录这个差异

(2)     【为矩形簇的图形进行调整】对于整个集合的不同维度上,通过宽度来进行规范化

(3)     【选择最不同的一对记录】找出在同一纬度上区别最大的一对

  • 0
  • 3
    觉得还不错? 一键收藏
  • 0




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


