Nearest Neighbor Queries, Nick Roussopoulos
Summary of the paper:
- Introduce the NN problem
There is a wide variety of spatial access methods, however, very few have been used for NN. Although, NN algorithms have been proposed for quad-tree and k-d-tree, a NN algorithm for R-tree is required. - Branch-and-bound
The main idea in this NN algorithm is branch-and-bound, which decreasing unnecessary queries by calculating the proximity of the nearest distance. This pruning strategy is widely used in AI area. - Core of the branch-and bound method: Metrics Choosing
How to estimate the nearest distance is the key to successful pruning. In this paper, the author introduced two metrics for heuristic search- MINDIST
The minimum distance from the point P to all the faces of a MBR.
MINDIST is a lower bound for nearest distance. - MINMAXDIST
The minimun of the maximun distance from the point P to any faces of a MBR.
Choose "min" to guarantee at least one object exist, larger will make the bound not tight enough, smaller might not guarantee one.
MINMAXDIST is an upper bound for nearest distance. - MINDIST<= || (P,o) || <= MINMAXDIST
- MINDIST
- Searching Order
MINDIST is optimistic distance while MINMAXDIST is a pessimistic one.
In most of the cases, the MINDIST ordering behaves well ( which is verified in later experiments), but in cases where the MBR is sparse, then MINMAXDIST may over-perform MINDIST ordering. - Nearest Neighbor Algorithm for R-trees
- Initialize the nearest distance as infinite distance
- Traverse the tree depth-first starting from the root. At each Index node, sort all MBRs using an ordering metric and put them in an Active Branch List (ABL).
- Apply pruning rules 1 and 2 to ABL
- Visit the MBRs from the ABL following the order until it is empty
- If Leaf node, compute actual distances, compare with the best NN so far, update if necessary.
- At the return from the recursion, use pruning rule 3
- When the ABL is empty, the NN search returns.
- Generalization: Finding the k Nearest Neighbors
Keep a sorted buff of at most k current nearest neighbors - Questions:
- The third step of the algorithm is a little big vague. It only says to apply pruning rules 1 and 2, but rule 2 can only applied to leaf.
- Is there any possibility to combine the two metrics orderings to give a overall better solution?
- Is there any other metrics could be used?
- Apart from spatial database, k-NN could be also used for classify and clustering of different patterns or objects.