cs231n assignment1 svm

最新推荐文章于 2020-04-19 20:55:11 发布

慕森

最新推荐文章于 2020-04-19 20:55:11 发布

阅读量958

点赞数

分类专栏：机器学习文章标签： python

本文链接：https://blog.csdn.net/bury_/article/details/76081639

版权

这篇博客详细解析了cs231n课程作业1中的svm_loss_naive函数，解释了W权重矩阵大小的确定，特别是为何要额外加入一列用于表示偏差b。博主探讨了损失函数L和得分向量S的概念，并引用了知乎上的翻译笔记。文章通过Python代码展示了如何计算损失、梯度和更新权重，并解释了向量化实现的细节，包括np.sum、np.reshape和np.maximum等函数的用法。此外，还介绍了使用随机梯度下降进行训练和预测的过程，以及交叉验证选择超参数的方法。最后，博主分享了权重可视化的部分结果。

摘要由CSDN通过智能技术生成

根据训练数据计算损失和权重矩阵W的梯度的函数，svm_loss_naive(W, X, y, reg)
cs231课件
关于损失函数L以及在不同模型上的得分向量S的理解，参见了知乎上翻译的官方课程笔记，知乎cs231n官方讲义翻译
python代码填补：

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in range(num_train):
    scores = X[i].dot(W) #矩阵乘法
    correct_class_score = scores[y[i]] #Syi
    '''
    print("x.shape:",X.shape) #(500, 3073) 矩阵
    print("X[i].shape:",X[i].shape) #(3073,) 向量
    print("scores's shape:",scores.shape)#(10,) 向量
    '''

    for j in range(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1，超参数
      if margin > 0:#在真实标签上的模型得分与该分类模型上得分差距不满足大于delta时计算损失
        loss += margin
        # Compute gradients (one inner and one outer sum)
        # Wonderfully compact and hard to read
        dW[:, y[i]] -= X[i, :].T # this is really a sum over j != y_i
        dW[:, j] += X[i, :].T # sums each contribution of the x_i's
  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW /= num_train
  ''' 
  print("w*w:",(W*W).shape)#(3073, 10)
  print("np.sum(w*w)",(np.sum(W*W)).shape)#标量 矩阵W*W每个元素的和
  '''
  # Add regularization to the loss.
  loss += 0.5 * reg * np.sum(W * W)
  # Gradient regularization that carries through per https://piazza.com/class/i37qi08h43qfv?cid=118
  dW += reg*W
  #############################################################################
  # TODO:                                                                     #
  # Compute the gradient of the loss function and store it dW.                #
  # Rather that first computing the loss and then computing the derivative,   #
  # it may be simpler to compute the derivative at the same time that the     #
  # loss is being computed. As a result you may need to modify some of the    #
  # code above to compute the gradient.                                       #
  #############################################################################


  return loss, dW

刚开始看示例代码的时候有3个困惑：

W权重矩阵的大小为3073*10，是如何确定的？
10是分类类别，3073是数据集的特征数目3072再加上1，为什么加上1？
以下是知乎上翻译的笔记里的内容，知乎CS231n官方笔记翻译——线性分类器上

所以说W中多出来的一列是偏差b的值，为此输入数据也要多增加一个全为1的行。
np.sum(W*W)计算的是什么？
计算正则化损失，但是没有加reg，求得的是W的每个元素平方后求和的一个标量值。
为什么要计算梯度？梯度用于更新权重W是如何做到的？
在train函数里，每次迭代中用grad乘以learning_rate来更新权重。

向量化的实现：
向量化的实现当中，首先要理解公式，
这里写图片描述
1. 首先计算已有的W计算出来的所有图片在所有类别上的得分scores，根据公式

S=Wx，将权重矩阵和输入数据进行矩阵乘法，得到500*10的scores矩阵，scores每一行代表的是该输入图片在10个类别模型上的得分。

计算scores_correct，即式子中的Syi，即scores矩阵中每一行中y真实类别标签对应的列，得到的是一个500*1的向量
计算margins，即scores和scores_correct的差距，maximum操作是为了筛选差距，如果在其他不正确的模型上的得分和真实类别对应的模型上的得分之差大于了Delta，说明模型在这个图片上的区分度很好，不用计算损失，但是如果两者之差小于了Delta，即两者得分之差加上Delta(阈值)仍然大于0，那么就需要计算损失了

python代码如下：

def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero

  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the structured SVM loss, storing the    #
  # result in loss.                                                           #
  #############################################################################
  scores=X.dot(W) #(500,10)
  num_classes=W.shape[1]
  num_train=X.shape[0]
  scores_correct = scores[np.arange(num_train), y]  #(500,) has to reshape, or will be ValueError
  scores_correct=np.reshape(scores_correct,(num_train,-1))
  margins=scores-scores_correct+1#delta=1
  margins=np.maximum(0,margins)
  margins[np.arange(num_train),y]=0#在计算loss时不把真实标签对应的得分之差delta计进去，即公式中j!=yi
  loss=np.sum(margins)/num_train
  loss += 0.5 * reg * np.sum(W * W)

  # compute the gradient
  margins[margins > 0] = 1
  row_sum = np.sum(margins, axis=1)                  # 1 by N
  margins[np.arange(num_train), y] = -row_sum        
  dW += np.dot(X.T, margins)/num_train + reg * W     # D by C
  ##################

最低0.47元/天解锁文章

慕森

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
cs231n assignment1 svm

根据训练数据计算损失和权重矩阵W的梯度的函数，svm_loss_naive(W, X, y, reg) 关于损失函数L以及在不同模型上的得分向量S的理解，参见了知乎上翻译的官方课程笔记，知乎cs231n官方讲义翻译 python代码填补：def svm_loss_naive(W, X, y, reg): """ Structured SVM loss function, naive
复制链接

扫一扫

专栏目录