想为数据集Y中的每一个点,在数据集X中找到距其(y)最近的k个点.
- 点的个数 k 由参数中的 n_neighbors表示
- 距离指的是 欧几里得距离(Euclidean distance )
函数输出:
- indices : 这k个最近的点的索引(X 的索引)
- distances : 在所有X的points中,距离Y中的(每个)点最近的k个点的距离
函数表达式:
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(Y)
详解举例:(先用NearestNeighbors函数计算,再求证)
from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[-1, -1],
[-2, -1],
[-3, -2],
[1, 1],
[2, 1],
[3, 2]])
Y = np.array([[1, 5],
[3,3]])
nbrs = NearestNeighbors(n_neighbors=1, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(Y)
print(indices)
print(distances)
# [[5]
# [5]]
# [[3.60555128]
# [1. ]]
# ---------- check ----------
def distEuclid(x, y):
distance= np.sqrt(np.sum(np.square(x-y)))
return distance
d = np.zeros((2,6), dtype=float)
for i in range(len(Y)):
for j in range(len(X)):
d[i,j] = distEuclid(X[j], Y[i])
print(d)
# [[6.32455532 6.70820393 8.06225775 4. 4.12310563 3.60555128]
# [5.65685425 6.40312424 7.81024968 2.82842712 2.23606798 1. ]]