根据训练数据计算损失和权重矩阵W的梯度的函数,svm_loss_naive(W, X, y, reg)
关于损失函数L以及在不同模型上的得分向量S的理解,参见了知乎上翻译的官方课程笔记,知乎cs231n官方讲义翻译
python代码填补:
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W) #矩阵乘法
correct_class_score = scores[y[i]] #Syi
'''
print("x.shape:",X.shape) #(500, 3073) 矩阵
print("X[i].shape:",X[i].shape) #(3073,) 向量
print("scores's shape:",scores.shape)#(10,) 向量
'''
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1,超参数
if margin > 0:#在真实标签上的模型得分与该分类模型上得分差距不满足大于delta时计算损失
loss += margin
# Compute gradients (one inner and one outer sum)
# Wonderfully compact and hard to read
dW[:, y[i]] -= X[i, :].T # this is really a sum over j != y_i
dW[:, j] += X[i, :].T # sums each contribution of the x_i's
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train
'''
print("w*w:",(W*W).shape)#(3073, 10)
print("np.sum(w*w)",(np.sum(W*W)).shape)#标量 矩阵W*W每个元素的和
'''
# Add regularization to the loss.
loss += 0.5 * reg * np.sum(W * W)
# Gradient regularization that carries through per https://piazza.com/class/i37qi08h43qfv?cid=118
dW += reg*W
#############################################################################
# TODO: #
# Compute the gradient of the loss function and store it dW. #
# Rather that first computing the loss and then computing the derivative, #
# it may be simpler to compute the derivative at the same time that the #
# loss is being computed. As a result you may need to modify some of the #
# code above to compute the gradient. #
#############################################################################
return loss, dW
刚开始看示例代码的时候有3个困惑:
- W权重矩阵的大小为3073*10,是如何确定的?
10是分类类别,3073是数据集的特征数目3072再加上1,为什么加上1?
以下是知乎上翻译的笔记里的内容,知乎CS231n官方笔记翻译——线性分类器上
所以说W中多出来的一列是偏差b的值,为此输入数据也要多增加一个全为1的行。 - np.sum(W*W)计算的是什么?
计算正则化损失,但是没有加reg,求得的是W的每个元素平方后求和的一个标量值。 - 为什么要计算梯度?梯度用于更新权重W是如何做到的?
在train函数里,每次迭代中用grad乘以learning_rate来更新权重。
向量化的实现:
向量化的实现当中,首先要理解公式,
1. 首先计算已有的W计算出来的所有图片在所有类别上的得分scores,根据公式
S=Wx,将权重矩阵和输入数据进行矩阵乘法,得到500*10的scores矩阵,scores每一行代表的是该输入图片在10个类别模型上的得分。
- 计算scores_correct,即式子中的Syi,即scores矩阵中每一行中y真实类别标签对应的列,得到的是一个500*1的向量
- 计算margins,即scores和scores_correct的差距,maximum操作是为了筛选差距,如果在其他不正确的模型上的得分和真实类别对应的模型上的得分之差大于了Delta,说明模型在这个图片上的区分度很好,不用计算损失,但是如果两者之差小于了Delta,即两者得分之差加上Delta(阈值)仍然大于0,那么就需要计算损失了
python代码如下:
def svm_loss_vectorized(W, X, y, reg):
"""
Structured SVM loss function, vectorized implementation.
Inputs and outputs are the same as svm_loss_naive.
"""
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
#############################################################################
# TODO: #
# Implement a vectorized version of the structured SVM loss, storing the #
# result in loss. #
#############################################################################
scores=X.dot(W) #(500,10)
num_classes=W.shape[1]
num_train=X.shape[0]
scores_correct = scores[np.arange(num_train), y] #(500,) has to reshape, or will be ValueError
scores_correct=np.reshape(scores_correct,(num_train,-1))
margins=scores-scores_correct+1#delta=1
margins=np.maximum(0,margins)
margins[np.arange(num_train),y]=0#在计算loss时不把真实标签对应的得分之差delta计进去,即公式中j!=yi
loss=np.sum(margins)/num_train
loss += 0.5 * reg * np.sum(W * W)
# compute the gradient
margins[margins > 0] = 1
row_sum = np.sum(margins, axis=1) # 1 by N
margins[np.arange(num_train), y] = -row_sum
dW += np.dot(X.T, margins)/num_train + reg * W # D by C
##################