The Closest M Points
The course of Software Design and Development Practice is objectionable. ZLC is facing a serious problem .There are many points in K-dimensional space .Given a point. ZLC need to find out the closest m points. Euclidean distance is used as the distance metric between two points. The Euclidean distance between points p and q is the length of the line segment connecting them.In Cartesian coordinates, if p = (p 1, p 2,…, p n) and q = (q 1, q 2,…, q n) are two points in Euclidean n-space, then the distance from p to q, or from q to p is given by:Can you help him solve this problem?
In the first line of the text file .there are two non-negative integers n and K. They denote respectively: the number of points, 1 <= n <= 50000, and the number of Dimensions,1 <= K <= 5. In each of the following n lines there is written k integers, representing the coordinates of a point. This followed by a line with one positive integer t, representing the number of queries,1 <= t <=10000.each query contains two lines. The k integers in the first line represent the given point. In the second line, there is one integer m, the number of closest points you should find,1 <= m <=10. The absolute value of all the coordinates will not be more than 10000.
For each query, output m+1 lines:
The first line saying :”the closest m points are:” where m is the number of the points.
The following m lines representing m points ,in accordance with the order from near to far
It is guaranteed that the answer can only be formed in one ways. The distances from the given point to all the nearest m+1 points are different. That means input like this:
2 2
1 1
3 3
2 2
will not exist.Sample Input
3 2
1 1
1 3
3 4
2 3
2 3
1Sample Output
the closest 2 points are:
1 3
3 4
the closest 1 points are:
1 3
假设我们在某个节点使用第k个属性kt” role=”presentation” style=”position: relative;”>ktkt那我们进入该节点的左子树继续分类,否则进入右子树。我们举个例子, 这个例子参考了Christopher G. Healey’s Advanced data structure notes:
名字(name) | 身高(ht) | 体重(wt) |
瞌睡虫(Sleepy) | 36 | 48 |
开心果(Happy) | 34 | 52 |
博士(Doc) | 38 | 51 |
糊涂蛋(Dopey) | 37 | 54 |
爱生气(Grumpy) | 32 | 55 |
喷嚏精(Sneezy) | 35 | 46 |
害羞鬼(Bashful) | 33 | 50 |
白雪公主(Ms.White) | 65 | 98 |
我们轮流使用两个属性身高和体重来分类数据,我们首先使用身高属性,通常我们可以使用 身高的中位数对应的值作为划分值,但这里我们为了简便,我们就按顺序选择每条数据去建立kd-tree,例如第一条数据是瞌睡虫,使用的是身高属性,因此kd-tree的根节点就是ht:36, 接着添加第二条数据:开心果。因为开心果的身高为34,所以进入到根节点的左子树,由于是第二层,所以我们应该使用体重属性,所以kd-tree,因此根节点的左子树的根节点为wt:52,同理我们添加第三个数据:博士,博士的身高高于36,因此进入到右子树,然后我们使用体重属性,因此根节点的右子树的根节点为wt:51. 如图所示:
一般的kd-tree都是将数据存储在叶子节点上,但是对于我们这一题,我按照这种方式实现后,提交结果超时,后来我参考了别人的做法(,在他的实现中,直接将数据存在每个节点上,即做分类的节点(根节点及其他非叶子 节点),也存储了数据。即如图:
using namespace std;
const int N = 50000;
short currentDim; //ys: which dimension we are currently working on
#define square(x) (x)*(x)
int n;
short K, t;
struct point
short pos[5];
bool flag; //ys: whether this is a valid point
flag = false;
bool operator < (const point &a) const
return pos[currentDim] < a.pos[currentDim];
point input_v[N];
priority_queue<pair<double, point>> pq;
point kd_tree[4*N];
我们假设有四个点,我们每次选择当前维上排在中间的元素作为划分的节点,如果总的点数为偶数,那么就会有两个点排在中间,这是我们选择较小的那个作为划分的节点,因此如图所示,根节点把所有数据分为两类,比它小的有一个,比它大的有两个,图中每个节点中的数字表示该节点在数组中的存储位置,因为根节点的位置为1,那么它的左儿子的位置就应该为2,右儿子的位置就应该为3,由于右边有两个节点,所以我们要继续划分,找出当前维上排在中间的节点,因为只有两个节点,所以两个节点都算是排在中间的节点,因此我们选择较小的那个作为划分节点,存储在位置3上,而另外那个节点比这个划分节点大,所以成为它的右儿子,存储在位置7上,虽然这个节点已经是最后一个节点了,但是我们的程序判断不再进一步构建子树的条件是当前节点为无效节点(即flag=false), 但我们当前的节点是有效节点,所以程序仍会继续检测它的左子树和右子树,因此位置14和15会被检测到(当然这个过程可以优化,从而避免为位置14和15分配空间)。此时,我们发现4个节点,但在最坏的情况下,需要检测到位置15,由于我们的数组是从0开始的(虽然根节点是从1开始),所以我们的数组大小为16,是节点数的4倍,所以这就是为什么kd-tree数组的大小为4N。
inline void construct_kdTree(int n, short K, int p, int r, int index, short depth)
if(p <= r)
currentDim = depth % K;
int mid = (p + r) / 2;
nth_element(input_v+p, input_v+mid, input_v+r+1);
kd_tree[index] = input_v[mid];
kd_tree[index].flag = true;
construct_kdTree(n, K, p, mid-1, 2*index, depth+1);
construct_kdTree(n, K, mid+1, r, 2*index+1, depth+1);
inline void query_kdTree(point p, short closestM, short K, int index, short depth)
//if(index >= 2*N || !kd_tree[index].flag)
pair<double, point> current_node(0, kd_tree[index]);
for(int i = 0; i < K; i++)
current_node.first += square(p.pos[i]-current_node.second.pos[i]);
short idm = depth % K;
short flag = 0; //ys: whether we need to explore other side of the split
int lchild = 2*index;
int rchild = 2*index+1;
if(kd_tree[index].pos[idm] <= p.pos[idm])
query_kdTree(p, closestM, K, rchild, depth+1);
query_kdTree(p, closestM, K, lchild, depth+1);
if(pq.size() < closestM)
flag = 1;
if(current_node.first <
if(square(p.pos[idm] - kd_tree[index].pos[idm]) <
flag = 1;
if(kd_tree[index].pos[idm] <= p.pos[idm])
query_kdTree(p, closestM, K, lchild, depth+1);
query_kdTree(p, closestM, K, rchild, depth+1);
查询point A虽然和point B,point C分为一类,但实际上离他最近的是分在另外一类的point D。那么如何判断是否可能存在这种情况呢?其实很简单,我们检测查询point与当前point在当前维上的差距,即:我们图中的point A到划分直线的距离,如果这个距离大于pq中的最大距离,那么我们就不用检测划分直线另一侧的点了,因为划分直线另一侧的点到point A的距离必然大于等于划分直线到point A的距离。否则,我们就需要检测划分直线另一侧的点,即:当前point的另一分支。所以接下来的代码中,如果flag为true,那么我们就检测当前point的另一分支。
int main()
for(int i = 0; i < n; i++)
for(short j = 0; j < K; j++)
scanf("%hd", &input_v[i].pos[j]);
memset(kd_tree, 0, sizeof(point)*4*N);
construct_kdTree(n, K, 0, n-1, 1, 0);
scanf("%hd", &t);
for(short c = 0; c < t; c++)
short closestM;
point tmpv;
for(short j = 0; j < K; j++)
scanf("%hd", &tmpv.pos[j]);
scanf("%hd", &closestM);
query_kdTree(tmpv, closestM, K, 1, 0);
printf("the closest %hd points are:\n", closestM);
point pt[10];
for(short i = 0; !pq.empty(); i++)
pt[i] =;
for(short i = closestM-1; i >= 0; i--)
for(short j = 0; j < K; j++)
printf("%hd", pt[i].pos[j]);
if(j == K-1)
printf(" ");
return 0;
