Tutorials - kd-tree

VLFeat implements the randomized kd-tree forest from FLANN. This enables fast medium and large scale nearest neighbor queries among high dimensional data points (such as those produced by SIFT).

Introduction

A kd-tree is a data structure used to quickly solve nearest-neighbor queries. Consider a set of 2D points uniformly distributed in the unit square:

  X = rand(2, 100) ;

A kd-tree is generated by using the vl_kdtreebuild function:

  kdtree = vl_kdtreebuild(X) ;

The returned kdtree indexes the set of points X. Given a query point Q, the function vl_kdtreequery returns its nearest neighbor in X:

  Q = rand(2, 1) ;
  [index, distance] = vl_kdtreequery(kdforest, X, Q) ;

Here index stores the index of the column of X that is closest to the point Qdistance is the squared euclidean distance between X(index),Q.

A kd-tree is a hierarchal structure built by partitioning the data recursively along the dimension of maximum variance. At each iteration the variance of each column is computed and the data is split into two parts on the column with maximum variance. The splitting threshold can be selected to be the mean or the median (use theThresholdMethod option of vl_kdtreebuild).

 
kd-tree partitions of a uniform set of data points, using the mean (left image) and the median (right image) thresholding options of vl_kdtreebuild. On the bottom right corner a query point is marked along with the ten closest neighbors as found byvl_kdtreequery. Figure generated by vl_demo_kdtree.

Querying

vl_kdtreequery uses a best-bin first search heuristic. This is a branch-and-bound technique that maintains an estimate of the smallest distance from the query point to any of the data points down all of the open paths.

vl_kdtreequery supports two important operations: approximate nearest-neighbor search and k-nearest neighbor search. The latter can be used to return the knearest neighbors to a given query point Q. For instance:

[index, distance] = vl_kdtreequery(kdtree, X, Q, 'NumNeighbors', 10) ;

returns the closest 10 neighbors to Q in X and their distances, stored along the columns of index and distance.

The MaxComparisons option is used to run an ANN query. The parameter specifies how many paths in the best-bin-first search of the kd-tree can be checked before giving up and returning the closest point encountered so far. For instance:

[index, distance] = vl_kdtreequery(kdtree, X, Q, 'NumNeighbors', 10, 'MaxComparisons', 15) ;

does not compare any point in Q with more than 15 points in X.

   
Finding the 10 approximated nearest neighbors for increasing values of theMaxComparisons parameter. Note that at most MaxComparisons neighbors can be returned (if more are requested, they are ignored). Figure generated byvl_demo_kdtree_ann.

Randomized kd-tree forests

VLFeat supports constructing randomized forests of kd-trees to improve the effectiveness of the representation in high dimensions. The parameter NumTrees ofvl_kdtreebuild specifies how many trees to use in constructing the forest. Each tree is constructed independently. Instead of always splitting on the maximally variant dimension, each tree chooses randomly among the top five most variant dimensions at each level. When querying, vl_kdtreequery runs best-bin-first across all the trees in parallel. For instance

  kdtree = vl_kdtreebuild(X, 'NumTrees', 4) ;
  [index, distance] = vl_kdtreequery(kdtree, X, Q) ;

constructs four trees and queries them.

   
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值