搬运自 YouTube:PyCon2015
python 是一门优雅的语言,尽量少使用循环,选择结构
KNN是一个很简单的算法,这里不再介绍
特别的,nearest neighbor 是k=1时的特殊情况
如何使用numpy 中的broadcast机制优雅地实现呢?
numpy broadcast官方文档
Losing your Loops Fast Numerical Computing with NumPy(PyCon视频,需要翻墙)
是很棒的代码,如果我解释的不清楚建议看一下视频
直接复制一下broadcast核心的例子
In the following example, both the A and B arrays have axes with length one that are expanded to a larger size during the broadcast operation:
A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
这里做一次对齐说明
8 x 1 x 6 x 1
…..7 x 1 x 5
8 x 7 x 6 x 5
直接看代码!
import numpy as np
x = np.random.random((1000,3))
diff = x.reshape(1000,1,3)-x
D = (diff**2).sum(2)
i = np.arange(1000)
D[i,i]=np.inf
i = np.argmin(D,1)
print(i[:10])
结果
[555 362 353 255 812 849 665 396 744 266]
验证
from sklearn.neighbors import NearestNeighbors
d,i = NearestNeighbors().fit(x).kneighbors(x,2)
print(i[:10,1])
结果
[555 362 353 255 812 849 665 396 744 266]