最近邻算法
import numpy as np
class NearestNeighbor(object):
def __init__(self):
pass
def train(self, X, y):
""" X is N x D where each row is an example. Y is 1-dimension of size N """
# the nearest neighbor classifier simply remembers all the training data
self.Xtr = X
self.ytr = y
def predict(self, X):
""" X is N x D where each row is an example we wish to predict label for """
num_test = X.shape[0]
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest examplereturn Ypred
K-最近邻分类器/K-NN
- L2距离:
- L1:决策边界趋向坐标轴方向
- L2: 决策边界存放在最自然地方,不受坐标轴影响
- 过拟合:决不能使用测试集来进行调优。应该把测试集看做非常珍贵的资源,不到最后一步,绝不使用它。如果你使用测试集来调优,而且算法看起来效果不错,那么真正的危险在于:算法实际部署后,性能可能会远低于预期。
代码不清楚的地方:
reshape()的用法:
arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) # 创建一个一维数组
- 方法一:
arr1 = arr1.reshape(3, 3)
#本方法直接给出要转换的矩阵形态,参数分别为行数和列数。
#一维数组转换成三维数组
- 方法二:
arr1 = arr1.reshape(-1, 3)
#本方法只给定了列数,由编译器直接计算行数,并进行转换操作。这种方法可以简单快捷的实现矩阵转换
- assignment中的代码:
X_train = np.reshape(X_train, (X_train.shape[0], -1))
#表示把X_Train数据集转换为5000行的矩阵,列数由编译器进行计算并转换为对应的矩阵
- lecture note中的代码
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3)
# Xtr_rows becomes 50000 x 3072 此处指定了要转换的列数为32*32*3
numpy.zeros()的用法:
numpy.zeros(shape,dtype=float,order = 'C')
Example:
import numpy as np
print(np.zeros((2,5)))
#结果为一个2行5列的矩阵
# [[0. 0. 0. 0. 0.]
# [0. 0. 0. 0. 0.]]
self的概念:等同于java里的this,类的实例化