Closest Point Problem

2 篇文章 0 订阅
1 篇文章 0 订阅

Text and figures are summrized from the book - Data Structure and Algorithm Analysis in C++ Ch.10.2.2.
Code are referred from Rosetta Code.


Given N points in a plane, the closest pair of points can be found in O ( N l o g N ) O(N log N) O(NlogN) time.
exhaustive search: N ( N − 1 ) N(N-1) N(N1)-> O ( N 2 ) O(N^2) O(N2)
It has a close relationship with Merge Sort

Assume a small point set P P P.
在这里插入图片描述

std::default_random_engine re(std::chrono::system_clock::to_time_t(
		std::chrono::system_clock::now() ) );
std::uniform_real_distribution<double> urd(-100.0, 100.0);

std::vector<point_T> P(100);
std::generate(P.begin(), P.end(), [&urd, &re]() {
	return point_T { urd(re), urd(re)};
});
  1. Assume that the points have been sorted by x coordinate. e.g., using merge sort will add O ( N l o g N ) O(N log N) O(NlogN) to the final time bound.
std::sort( P.begin(), P.end(), [](const point_T& a, const point_T& b) {
	return a.first < b.first;
});

在这里插入图片描述

  1. Since the points are sorted by x coordinate, we can draw an imaginary vertical line that partitions the point set into two halves, P L P_L PL and P R P_R PR. Then the closest points are:
    • both in P L P_L PL -> distances is d L d_L dL
    • both in P R P_R PR-> d R d_R dR
    • one is in P L P_L PL, the other is in P R P_R PR-> d C d_C dC
    auto N = P.size();
    auto PL = std::vector<point_T>();
    auto PR = std::vector<point_T>();
    std::copy(P.begin(), P.begin() + N/2, std::back_inserter(PL));
    std::copy(P.begin() + N/2, P.end(), std::back_inserter(PR));

    double xM = P[(N-1)/2].first; // the middle point x coordinate
  1. compute d L d_L dL and d R d_R dR recursively. how? The problem, then, is to compute d C d_C dC. Since we would like an O ( N l o g N ) O(N log N) O(NlogN) solution, we must be able to compute d C d_C dC with only O ( N ) O(N) O(N) additional work. why can’t O ( N l o g N ) O(NlogN) O(NlogN)?
// compute left and right side recursively
std::pair<double, pointPair_T> ret1 = optimizedSearch(PL, QL);
std::pair<double, pointPair_T> ret2 = optimizedSearch(PR, QR);

there are l o g N logN logN recursions, so for each recursion, computing d C d_C dC can’t exceed O ( N ) O(N) O(N)
在这里插入图片描述

  1. Let δ = m i n ( d L , d R ) \delta = min(d_L, d_R) δ=min(dL,dR). The first observation is that we only need to compute d C d_C dC if d C d_C dC improves on δ \delta δ. If d C d_C dC is such a distance, then the two points that define d C d_C dC must be within δ \delta δ of the dividing line; we will refer to this area as a strip.

  2. Two strategies to compute d C d_C dC:

    • For large point sets that are uniformly distributed, the number of points that are expected to be in the strip is very small. Indeed, it is easy to argue that only O ( N ) O(\sqrt{N}) O(N ) ==why?==points are in the strip on average. Then perform a brute-force calculation on these points in O ( N ) O(N) O(N) time. Pseudocode of Brute-force calculation of m i n ( δ , d C ) min(δ, d_C) min(δ,dC):
    // Points are all in the strip
    for( i = 0; i < numPointsInStrip; i++ )
    	for( j = i + 1; j < numPointsInStrip; j++ )
    		if( dist(pi, pj) < δ )
    			δ = dist(pi, pj);
    

    However, in the worst case, all the points could be in the strip, so this strategy does not always work in linear time.

    • Second observation: The y coordinates of the two points that define d C d_C dC can differ by at most δ. Suppose that the points in the strip are sorted by their y coordinates. Therefore, if p i p_i pi and p j p_j pj’s y coordinates differ by more than δ, then we can proceed to p i + 1 p_{i+1} pi+1. Pseudocode:
    // Points are all in the strip and sorted by y-coordinate
    for( i = 0; i < numPointsInStrip; i++ )
    	for( j = i + 1; j < numPointsInStrip; j++ )
    		if( pi and pj’s y-coordinates differ by more than δ )
    			break;
    		// Go to next pi.
    		else
    			if( dist(pi, pj) < δ )
    				δ = dist(pi, pj);
    

    在这里插入图片描述

    This extra test has a significant effect on the running time, because for each p i p_i pi only a few points p j p_j pj are examined before p i p_i pi’s and p j p_j pj’s y coordinates differ by more than δ and force an exit from the inner for loop. e.g., point p 3 p_3 p3.



for ( auto i = strip.begin(); i != strip.end(); i++ ){
	for ( auto k = i + 1; k != strip.end(); k++) {
		// since strip is sorted by y coordinate, std::abs isn't needed.
		if((k->second - i->second) >= delta) break;

		auto new_dist = std::abs(dist(*k, *i));
		if ( new_dist < result.first ) {
			result = { new_dist, { *i, *k } };
		}
	}
}
  1. The worst case: all points in the strip lie in one of the δ-by-δ squares. Hence, for any point p i p_i pi, at most 7 points p j p_j pj are considered. (note, we’re not talking about the above point example)
    在这里插入图片描述

    On the other hand, by the definition of δ \delta δ, all the points in each δ-by-δ square are separated by at least δ. In the worst case, each square contains four points, one at each corner. Notice that even though p L 2 p_{L2} pL2 and p R 1 p_{R1} pR1 have the same coordinates, they could be different points. For the actual analysis, it is only important that the number of points in the δ-by-2δ rectangle be O ( 1 ) O(1) O(1), and this much is certainly clear.
    Because at most 7 points are considered for each p i p_i pi, the time to compute a d C d_C dC that is better than δ is O ( N ) O(N) O(N).(must ≤ 7 N \leq 7N 7N)

  2. We recursively split according to x coordinate. So each recursion, we have to sort y coordinate, which causes O ( n l o g n ) O(n log n) O(nlogn) extra work( n = N / 2 i , i = 0 , 1 , . . , l o g N n=N/2^i, i=0,1,..,logN n=N/2i,i=0,1,..,logN). There are l o g N logN logN recursion, so the total time complexity is O ( N l o g 2 N ) O(N log^2 N) O(Nlog2N)

  3. Solution: maintain two lists:

    • the point list sorted by x coordinate P P P
    • the point list sorted by y coordinate Q Q Q
    • Both can be obtained by a pre-processing sorting step at cost O ( N l o g N ) O(N log N) O(NlogN) and thus does not affect the time bound.
std::sort( P.begin(), P.end(), [](const point_T& a, const point_T& b) {
	return a.first < b.first;
});
auto Q = P;
std::sort( Q.begin(), Q.end(), [](const point_T& a, const point_T& b) {
	return a.second < b.second;
});

Then we can split them:

  • P L P_L PL and Q L Q_L QL are the lists passed to the left-half recursive call.
  • P R P_R PR and Q R Q_R QR are the lists passed to the right-half recursive call.
  • How to get them? As before, P P P is easily split in the middle, we get the dividing line. we step through Q Q Q sequentially, placing each element in Q L Q_L QL or Q R Q_R QR as appropriate. It is easy to see that Q L Q_L QL and Q R Q_R QR will be automatically sorted by y coordinate.
// divide the point set into left and right parts.
auto N = P.size();
auto PL = std::vector<point_T>();
auto PR = std::vector<point_T>();
std::copy(P.begin(), P.begin() + N/2, std::back_inserter(PL));
std::copy(P.begin() + N/2, P.end(), std::back_inserter(PR));

double xM = P[(N-1)/2].first; // the middle point x coordinate

auto QL = std::vector<point_T>();
auto QR = std::vector<point_T>();
std::copy_if(Q.begin(), Q.end(), std::back_inserter(QL), [&xM](const point_T& point){
	return point.first <= xM;
});
std::copy_if(Q.begin(), Q.end(), std::back_inserter(QR), [&xM](const point_T& point){
	return point.first > xM;
});

When the recursive calls return, we scan through the Q list and discard all the points whose x coordinates are not within the strip. Then a new list contains only points in the strip, and these points are guaranteed to be sorted by their y coordinates.
In this case, for each recursion, only O ( n ) O(n) O(n) extra work happens ( n = N / 2 i , i = 0 , 1 , . . , l o g N n=N/2^i, i=0,1,..,logN n=N/2i,i=0,1,..,logN).

// compute left and right side recursively
std::pair<double, pointPair_T> ret1 = optimizedSearch(PL, QL);
std::pair<double, pointPair_T> ret2 = optimizedSearch(PR, QR);
// pick up the better one
std::pair<double, pointPair_T> result = ( ret1.first <= ret2.first )? ret1 : ret2;

double delta = result.first;
auto strip = std::vector<point_T>();
std::copy_if(Q.begin(), Q.end(), std::back_inserter(strip), [&delta, &xM](const point_T& point){
	return std::abs(xM - point.first) < delta;
});

So the total time complexity consists:

  • pre-processing sort x, y: O ( N l o g N ) O(N log N) O(NlogN). happens only once
  • traverse Q Q Q after each dividing of P P P: O ( N ) O(N) O(N). happens every recursion
  • compute d C d_C dC: O ( N ) O(N) O(N). happens every recursion
  • total recursion times: O ( l o g N ) O(logN) O(logN)

Complete code: GitHub

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值