I have data for latitude and longitude, and I need to calculate distance matrix between two arrays containing locations. I used this This to get distance between two locations given latitude and longitude.
Here is an example of my code:
import numpy as np
import math
def get_distances(locs_1, locs_2):
n_rows_1 = locs_1.shape[0]
n_rows_2 = locs_2.shape[0]
dists = np.empty((n_rows_1, n_rows_2))
# The loops here are inefficient
for i in xrange(n_rows_1):
for j in xrange(n_rows_2):
dists[i, j] = get_distance_from_lat_long(locs_1[i], locs_2[j])
return dists
def get_distance_from_lat_long(loc_1, loc_2):
earth_radius = 3958.75
lat_dif = math.radians(loc_1[0] - loc_2[0])
long_dif = math.radians(loc_1[1] - loc_2[1])
sin_d_lat = math.sin(lat_dif / 2)
sin_d_long = math.sin(long_dif / 2)
step_1 = (sin_d_lat ** 2) + (sin_d_long ** 2) * math.cos(math.radians(loc_1[0])) * math.cos(math.radians(loc_2[0]))
step_2 = 2 * math.atan2(math.sqrt(step_1), math.sqrt(1-step_1))
dist = step_2 * earth_radius
return dist
My expected output is this:
>>> locations_1 = np.array([[34, -81], [32, -87], [35, -83]])
>>> locations_2 = np.array([[33, -84], [39, -81], [40, -88], [30, -80]])
>>> get_distances(locations_1, locations_2)
array([[ 186.13522573, 345.46610882, 566.23466349, 282.51056676],
[ 187.96657622, 589.43369894, 555.55312473, 436.88855214],
[ 149.5853537 , 297.56950329, 440.81203371, 387.12153747]])
Performance is important for me, and one thing I could do is use Cython to speed up the loops, but it would be nice if I don't have to go there.
Is there a module that can do something like this? Or any other solution?
解决方案
There's a lot of suboptimal things in the Haversine equations you are using. You can trim some of that and minimize the number of sines, cosines and square roots you need to calculate. The following is the best I have been able to come up with, and on my system runs about 5x faster than Ophion's code (which does mostly the same as far as vectorization goes) on two random arrays of 1000 and 2000 elements:
def spherical_dist(pos1, pos2, r=3958.75):
pos1 = pos1 * np.pi / 180
pos2 = pos2 * np.pi / 180
cos_lat1 = np.cos(pos1[..., 0])
cos_lat2 = np.cos(pos2[..., 0])
cos_lat_d = np.cos(pos1[..., 0] - pos2[..., 0])
cos_lon_d = np.cos(pos1[..., 1] - pos2[..., 1])
return r * np.arccos(cos_lat_d - cos_lat1 * cos_lat2 * (1 - cos_lon_d))
If you feed it your two arrays "as is" it will complain, but that's not a bug, it's a feature. Basically, this function computes the distance on a sphere over the last dimension, and broadcasts on the rest. So you can get what you are after as:
>>> spherical_dist(locations_1[:, None], locations_2)
array([[ 186.13522573, 345.46610882, 566.23466349, 282.51056676],
[ 187.96657622, 589.43369894, 555.55312473, 436.88855214],
[ 149.5853537 , 297.56950329, 440.81203371, 387.12153747]])
But it could also be used to calculate the distances between two lists of points, i.e.:
>>> spherical_dist(locations_1, locations_2[:-1])
array([ 186.13522573, 589.43369894, 440.81203371])
Or between two single points:
>>> spherical_dist(locations_1[0], locations_2[0])
186.1352257300577
This is inspired on how gufuncs work, and once you get used to it, I have found it to be a wonderful "swiss army knife" coding style, that lets you reuse a single function in lots of different settings.
                  
                  
                  
                  
本文介绍了一种使用Python和NumPy高效计算大量经纬度坐标间距离的方法。通过优化Haversine公式并利用NumPy的矢量化操作,实现了比原始循环快5倍的距离计算功能。
          
      
          
                
                
                
                
              
                
                
                
                
                
              
                
                
              
            
                  
					2008
					
被折叠的  条评论
		 为什么被折叠?
		 
		 
		
    
  
    
  
            


            