Range Searching and Multi-Dimensional Data|Quadtrees|K-d Trees|CS 61B Data Structures, Spring 2019

Multi-Dimensional Data

Motivation: 2D Range Finding and Nearest Neighbors

Suppose we want to perform operations on a set of Body objects in space?

  • 2D Range Searching: How many objects are in the highlighted rectangle?
  • Nearest Neighbors: What is the closest object to the space horse?

在这里插入图片描述

A silly way to find the Answer:

Suppose the set is stored as a hash table, where the hash code of each Body is given below.

How would nearest( ) be implemented?

  • The bucket that each object is effectively random. Have to iterate over all N items.

What would be runtime of nearest?

  • Θ(N)

The problem with hash tables is that the bucket # of an item is effectively random.

How to fix it?

Uniform Partitioning

The problem with hash tables is that the bucket # of an item is effectively random

  • One fix is to ensure that the bucket #s depend only on position!
Simplest idea: Partition space into uniform rectangular buckets (also called “bins”).
  • Example on right for 4x4 grid of such buckets.

在这里插入图片描述

Question:

What is the runtime for nearest assuming points are evenly spread out?
Still Θ(N).
On average, runtime will be 16 times faster than without the spatial partitioning, but N/16 is still Θ(N).
Note: Often works well in practice. But let’s see a better way

QuadTrees

Trees vs. Hash Tables

One key advantage of Search Trees over Hash Tables is that trees explicitly track the order of items.

Examples:

  • Finding the minimum item in a BST is Θ(log N) time, but Θ(N) in a hash table.
  • Can relatively easy modify BSTs to support operations like rank(5), which returns the 5th largest item in Θ(log N) time (see optional textbook).
    • Impossible to do efficiently in a hash table.
Basic idea:

Build a BST where every node has 4 neighbors, corresponding to northwest, northeast, southwest, and southeast.

Quadtree Insertion Demo

Below: Quadtree Representation of 5 objects in 2D space.

  • Insertion Demo: Link

在这里插入图片描述

Quadtrees are also spatial partitioning in disguise.

  • Earlier approach: Uniform partitioning (perfect grid of rectangles).
  • Quadtrees: Hierarchical partitioning. Each node “owns” 4 subspaces.
    • Space is more finely divided in regions where there are more points.(空间划分的更为精细,可以准确的找到某个点所在的位置)
    • Results in better runtime in many circumstances
      在这里插入图片描述
Quadtree Range Finding Demo

Quadtrees allow us to better prune when performing a rectangle search.

  • Same optimization as before: Prune (i.e. don’t explore) subspaces that don’t intersect the query rectangle.(不探索不与感兴趣区域相交的部分)
  • Range Search Demo: Link

Higher Dimensional Data and K-d Trees

3D Data and Even Higher Dimensional Space
  1. Suppose we want to store objects in 3D space.
  • Quadtrees only have four directions, but in 3D, there are 8.

One approach: Use an Oct-tree or Octree.

  • Very widely used in practice.
    在这里插入图片描述
  1. Even Higher Dimensional Space
    you may want to organize data on a large number of dimensions.
    Example: Want to find songs with the following features:
  • Length between 3 minutes and 6 minutes.
  • Between 1000 and 20,000 listens.(有1000到两万人听过)
  • Between 120 to 150 BPM.(歌曲的速度又称“”曲速“”)
  • Were recorded after 2004.

In these cases, one somewhat common solution is a k-d tree.

  • Fascinating data structure that handles arbitrary numbers of dimensions.
    • k-d means “k dimensional”.
  • For the sake of simplicity in lecture, we’ll use 2D data, but the idea generalizes naturally.
K-d Trees

k-d tree example for 2-d:

  • Basic idea, root node partitions entire space into left and right (by x).
  • All depth 1 nodes partition subspace into up and down (by y).
  • All depth 2 nodes partition subspace into left and right (by x).

若根节点按照x排序,则其第一级子节点按y排序,第二级子节点按x排序,以此类推

Each point owns 2 subspaces.

  • Similar to a quadtree.
    • Example: D owns the two subspaces shown.
  • The top subspace is infinitely large.

红线代表在该节点,其所在空间被按照该节点x周划分,蓝线则代表空间被所在节点的y轴划分。

K-d Trees and Nearest Neighbor

Just like spatial partitioning and quadtrees, k-d trees support an efficient nearest method.

  • Optimization: Do not explore subspaces that can’t possible have a better answer than your current best.

Example: Find the nearest point to (0, 7).
Answer is (1, 5).

Nearest Pseudocode

在这里插入图片描述

在这里插入图片描述

Summary and Applications

Multi-Dimensional Data Summary

Multidimensional data has interesting operations:

  • Range Finding: What are all the objects inside this (rectangular) subspace?
  • Nearest: What is the closest object to a specific point?
    • Can be generalized to k-nearest.

The most common approach is spatial partitioning:

  • Uniform Partitioning: Analogous to hashing.
  • Quadtree: Generalized 2D BST where each node “owns” 4 subspaces.
  • K-d Tree: Generalized k-d BST where each node “owns” 2 subspaces.
    • Dimension of ownership cycles with each level of depth in tree.

Spatial partitioning allows for pruning of the search space.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值