向量化操作:
向量化操作是指利用数组操作而不是显式的循环来进行计算,这样可以充分利用底层优化和并行处理,从而提高计算效率。
例如这一段代码
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
#######################################################################
# TODO: #
# Compute the l2 distance between the ith test point and all training #
# points, and store the result in dists[i, :]. #
# Do not use np.linalg.norm(). #
#######################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
dists[i] = np.sum((X[i] - self.X_train)**2, axis=1)**0.5
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
假设我们有如下的示例:
# 测试数据集(2个样本,每个样本有2个特征)
X = np.array([[1, 2],
[3, 4]])
# 训练数据集(3个样本,每个样本有2个特征)
self.X_train = np.array([[1, 0],
[0, 1],
[1, 1]])
假设 i = 0 时,测试样本是 X[0] = [1, 2]。
计算 X[0] 与所有训练样本的差值:
[[1, 2] - [1, 0],
[1, 2] - [0, 1],
[1, 2] - [1, 1]]
结果:
[[0, 2],
[1, 1],
[0, 1]]
np.sum((X[0] - self.X_train)**2, axis=1)**0.5
等价于
np.sum([[0, 2]**2,
[1, 1]**2,
[0, 1]**2], axis=1)**0.5
结果:
np.sum([[0, 4],
[1, 1],
[0, 1]], axis=1)**0.5
结果:
[4, 2, 1]**0.5
结果:
[2, √2, 1]