python计算两个数区间的和,有效地计算两个数据集之间的成对的Haversine距离-NumPy / Python...

I want to calculate the geo-distance between latitude-longitude.

I had checked this thread Vectorizing Haversine distance calculation in Python

but when I am using it for two different set of coordinates, I m getting an error.

df1 size can be in millions and if there is any other way to calculate accurate geo distance in less time then it would be really helpful.

length1 = 1000

d1 = np.random.uniform(-90, 90, length1)

d2 = np.random.uniform(-180, 180, length1)

length2 = 100

d3 = np.random.uniform(-90, 90, length2)

d4 = np.random.uniform(-180, 180, length2)

coords = tuple(zip(d1, d2))

df1 = pd.DataFrame({'coordinates':coords})

coords = tuple(zip(d3, d4))

df2 = pd.DataFrame({'coordinates':coords})

def get_diff(df1, df2):

data1 = np.array(df1['coordinates'].tolist())

data2 = np.array(df2['coordinates'].tolist())

lat1 = data1[:,0]

lng1 = data1[:,1]

lat2 = data2[:,0]

lng2 = data2[:,1]

#print(lat1.shape)

#print(lng1.shape)

#print(lat2.shape)

#print(lng2.shape)

diff_lat = lat1[:,None] - lat2

diff_lng = lng1[:,None] - lng2

#print(diff_lat.shape)

#print(diff_lng.shape)

d = np.sin(diff_lat/2)**2 + np.cos(lat1[:,None])*np.cos(lat1) * np.sin(diff_lng/2)**2

return 2 * 6371 * np.arcsin(np.sqrt(d))

get_diff(df1, df2)

ValueError Traceback (most recent call last)

in

----> 1 get_diff(df1, df2)

in get_diff(df1, df2)

26 print(diff_lat.shape)

27 print(diff_lng.shape)

---> 28 d = np.sin(diff_lat/2)**2 + np.cos(lat1[:,None])*np.cos(lat1) * np.sin(diff_lng/2)**2

29 return 2 * 6371 * np.arcsin(np.sqrt(d))

ValueError: operands could not be broadcast together with shapes (1000,1000) (1000,100)

解决方案

Pairwise haversine distances

Here's a vectorized way with broadcasting based on this post -

def convert_to_arrays(df1, df2):

d1 = np.array(df1['coordinates'].tolist())

d2 = np.array(df2['coordinates'].tolist())

return d1,d2

def broadcasting_based_lng_lat(data1, data2):

# data1, data2 are the data arrays with 2 cols and they hold

# lat., lng. values in those cols respectively

data1 = np.deg2rad(data1)

data2 = np.deg2rad(data2)

lat1 = data1[:,0]

lng1 = data1[:,1]

lat2 = data2[:,0]

lng2 = data2[:,1]

diff_lat = lat1[:,None] - lat2

diff_lng = lng1[:,None] - lng2

d = np.sin(diff_lat/2)**2 + np.cos(lat1[:,None])*np.cos(lat2) * np.sin(diff_lng/2)**2

return 2 * 6371 * np.arcsin(np.sqrt(d))

Hence, to solve your case to get all pairwise haversine distances, it would be -

broadcasting_based_lng_lat(*convert_to_arrays(df1,df2))

Elementwise haversine distances

For element-wise haversine distance computations between two data, such that each data holds latitude and longitude in two columns each or lists of two elements each, we would skip some of the extensions to 2D and end up with something like this -

def broadcasting_based_lng_lat_elementwise(data1, data2):

# data1, data2 are the data arrays with 2 cols and they hold

# lat., lng. values in those cols respectively

data1 = np.deg2rad(data1)

data2 = np.deg2rad(data2)

lat1 = data1[:,0]

lng1 = data1[:,1]

lat2 = data2[:,0]

lng2 = data2[:,1]

diff_lat = lat1 - lat2

diff_lng = lng1 - lng2

d = np.sin(diff_lat/2)**2 + np.cos(lat1)*np.cos(lat2) * np.sin(diff_lng/2)**2

return 2 * 6371 * np.arcsin(np.sqrt(d))

Sample run with a dataframe holding the two data in two columns -

In [42]: np.random.seed(0)

...: a = np.random.randint(10,100,(5,2)).tolist()

...: b = np.random.randint(10,100,(5,2)).tolist()

...: df = pd.DataFrame({'A':a,'B':b})

In [43]: df

Out[43]:

A B

0 [54, 57] [80, 98]

1 [74, 77] [98, 22]

2 [77, 19] [68, 75]

3 [93, 31] [49, 97]

4 [46, 97] [56, 98]

In [44]: from haversine import haversine

In [45]: [haversine(i,j) for (i,j) in zip(df.A,df.B)]

Out[45]:

[3235.9659882513424,

2399.6124657290075,

2012.0851666001824,

4702.8069773315865,

1114.1193334220534]

In [46]: broadcasting_based_lng_lat_elementwise(np.vstack(df.A), np.vstack(df.B))

Out[46]:

array([3235.96151855, 2399.60915125, 2012.08238739, 4702.80048155,

1114.11779454])

Those slight differences are largely because haversine library assumes 6371.0088 as the earth radius, while we are taking it as 6371 here.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值