CS231n_2_assignment1_SVM做题笔记

最新推荐文章于 2024-01-20 22:04:36 发布

etbrrre

最新推荐文章于 2024-01-20 22:04:36 发布

阅读量199

点赞数 2

分类专栏： CS231n学习笔记

本文链接：https://blog.csdn.net/lgzh123/article/details/84994646

版权

CS231n学习笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

一、几个公式&几句描述SVM的话

(一)Scores 和 Loss Function

第i个样本，第j类的得分：
$s_j=f(x_i,W)_j$
第i个样本的loss：
$L_i=∑_{ j≠y_i}max(0,s_j−s_{y_i}+Δ)$

In summary, the SVM loss function wants the score of the correct class y_i to be larger than the incorrect class scores by at least by Δ (delta). If this is not the case, we will accumulate loss.

（二）梯度公式

梯度怎么求？用链导法则。

1.不进行正则化(without regularization)

L_i先对t = s_j−s_yi+Δ求导，即求max(0,_)函数的导数
$dL_i/dt = 0(when \space t <0), or =1(when \space t >0)$
t = s_j−s_yi+Δ 对 W求导
$t = s_j−s_{y_i}+Δ$ 即 $t = X_iW_j-X_iW_{y_i}+Δ$
那么 $dt/dW_j = X_i$ $dt/dW_{y_i} = -X_i$
两步结合
怎么结合呢？对于grad矩阵的一列，是两个梯度向量都用上，还是j列只用对 $W_j$ 求得的梯度向量、y_i列只用对 $W_{y_i}$ 求得的梯度向量？
注意高数的知识：
$dL_i/dW = dt/dW_j + dt/dW_{y_i}$ 所以是两个梯度向量都要用上
我一开始在这儿也想错了，贴个链接，表示感谢
https://blog.csdn.net/silent_crown/article/details/78109461

2.进行正则化(with regularization)

进行正则化以后，Loss的计算公式变为：
$L_i=∑_{ j≠y_i}max(0,s_j−s_{y_i}+Δ)+\lambda W^2$
对应地，梯度矩阵中也要增加 $\lambda W$ 这一项

二、手撸梯度代码

def svm_loss_naive(W, X, y, reg):
  """
  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero
  num_classes = W.shape[1]  # 列数
  num_train = X.shape[0]    # 行数
  loss = 0.0    
  for i in xrange(num_train):
    scores = np.dot(X[i,:].reshape(1,-1), W)
    #print(scores.shape)
    correct_class_score = scores[0,y[i]]
    for j in xrange(num_classes):
        margin = scores[0,j] - correct_class_score + 1 # note delta = 1
        if j == y[i]:
            #if margin > 0:
            #    dW[:,j] -= X[i]
            continue
        else:
            if margin > 0:
                loss += margin
                dW[:,y[i]] -= X[i]
                dW[:,j] += X[i]
            #else:
            #    loss += 0
            #    dW[:,j] += np.zeros(3073)
  
  dW /= num_train
  dW += reg * W  # ！注意这里先平均，再加regularizaiton项带来的梯度！
  loss /= num_train
  loss += reg * np.sum(W * W)
  return loss, dW

三、

numpy数组，取出每行的指定列元素，组成一个列向量：

一个numpy矩阵，全部初始化为同一个值，两种方法：

  npdelta = delta * np.ones(scores_yi.shape)  # 略快一点点，很少很少的一点
  或者
  npdelta = np.zeros(scores_yi.shape)
  npdelta[:] = delta

etbrrre

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS231n_2_assignment1_SVM做题笔记

一、几个公式&amp;amp;amp;amp;amp;amp;amp;amp;几句描述SVM的话(一)Scores 和 Loss Function第i个样本，第j类的得分：sj=f(xi,W)js_j=f(x_i,W)_jsj=f(xi,W)j第i个样本的loss：Li=∑j≠yimax(0,sj−syi+Δ)L_i=∑_{ j≠y_i}max(0,s_j−s_{y_i}+Δ)Li=j̸=yi∑max(0,sj−syi...
复制链接

扫一扫

专栏目录