knn实现部分重要的是距离的求法。作业中提到了三种求距离的方法。分别用两个循环,一个循环和无循环(用矩阵点积)。
两个循环的方法:
def compute_distances_two_loops(self, X):
num_test = X.shape[0] #500
num_train = self.X_train.shape[0] #5000
dists = np.zeros((num_test, num_train))
for i in range(num_test):
for j in range(num_train):
dist=np.sqrt(np.sum(np.square(X[i]-self.X_train[j])))
dists[i][j]=dist
return dists
单个循环的方法:
def compute_distances_one_loop(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
dist=np.sqrt(np.sum(np.square(X[i]-self.X_train),axis=1))
dists[i,:]=dist
return dists
无循环的方法:
def compute_distances_no_loops(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
# (x1 - y1)^2 + (x2 - y2)^2 = (x1^2 + x2^2) + (y1^2 + y2^2) - 2*(x1*y1 + x2*y2)
train_sq = np.sum(self.X_train ** 2, axis=1, keepdims=True)
# (m, 1), 注意 keepdims 的含义
train_sq = np.broadcast_to(train_sq, shape=(num_train, num_test)).T # (n, m), 注意转置
test_sq = np.sum(X ** 2, axis=1, keepdims=True) # (n, 1)
test_sq = np.broadcast_to(test_sq, shape=(num_test, num_train)) # (n, m)
cross = np.dot(X, self.X_train.T) # (n, m)
dists = np.sqrt(train_sq + test_sq - 2 * cross) # 开根号
return dists
无循环求距离的推导过程: