CS231n_2020(2)—— 线性分类

Intro to Linear classification

如第一节所讲,knn的图像分类是非常昂贵的。为此,卷积神经网络可以解决这个问题。它主要分为两部分:第一部分是一个score function将原始数据映射到class scores,以及一个损失函数;第二部分是将这个问题转化为优化问题(optimization problem),即得到使损失函数最小化的参数。

Linear score function

假设一个图像的训练集为 x i ∈ R D x_i \in R^D xiRD,每个都有相对应的标签 y i y_i yi,其中, i = 1 … N , y i ∈ 1 … K i = 1 \dots N, y_i \in { 1 \dots K } i=1N,yi1K,这里,我们有N个例子(每个都有D个维度)和K个分类。例如,在CIFAR-10中,N=50000个图像,D = 32 x 32 x 3 = 3072个像素,K=10。
然后定义score function f : R D ↦ R K f: R^D \mapsto R^K f:RDRK

Linear classifier

f ( x i , W , b ) = W x i + b f(x_i, W, b) = W x_i + b f(xi,W,b)=Wxi+b
在CIFAR-10, x i x_i xi包含了第i个(i-th)图像里所有像素[3072 * 1],W为[10 * 3072],b为[10 * 1]。

Interpreting a linear classifier

在这里插入图片描述

Loss function

Multiclass SVM

score function 记作 s j = f ( x i , W ) j s_j = f(x_i, W)_j sj=f(xi,W)j,那么第i个图像的Multiclass SVM损失函数如下:
L i = ∑ j ≠ y i max ⁡ ( 0 , s j − s y i + Δ ) L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta) Li=j=yimax(0,sjsyi+Δ)
加入L2 的正规化惩罚 R ( W ) = ∑ k ∑ l W k , l 2 R(W) = \sum_k\sum_l W_{k,l}^2 R(W)=klWk,l2,得到完整的Multiclass SVM损失函数:
L = 1 N ∑ i L i ⏟ data loss + λ R ( W ) ⏟ regularization loss L = \underbrace{ \frac{1}{N} \sum_i L_i }_\text{data loss} + \underbrace{ \lambda R(W) }_\text{regularization loss} \\\\ L=data loss N1iLi+regularization loss λR(W)

L = 1 N ∑ i ∑ j ≠ y i [ max ⁡ ( 0 , f ( x i ; W ) j − f ( x i ; W ) y i + Δ ) ] + λ ∑ k ∑ l W k , l 2 L = \frac{1}{N} \sum_i \sum_{j\neq y_i} \left[ \max(0, f(x_i; W)_j - f(x_i; W)_{y_i} + \Delta) \right] + \lambda \sum_k\sum_l W_{k,l}^2 L=N1ij=yi[max(0,f(xi;W)jf(xi;W)yi+Δ)]+λklWk,l2
Code:

def L_i(x, y, W):
  """
  unvectorized version. Compute the multiclass svm loss for a single example (x,y)
  - x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
    with an appended bias dimension in the 3073-rd position (i.e. bias trick)
  - y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
  - W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
  """
  delta = 1.0 # see notes about delta later in this section
  scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
  correct_class_score = scores[y]
  D = W.shape[0] # number of classes, e.g. 10
  loss_i = 0.0
  for j in range(D): # iterate over all wrong classes
    if j == y:
      # skip for the true class to only loop over incorrect classes
      continue
    # accumulate loss for the i-th example
    loss_i += max(0, scores[j] - correct_class_score + delta)
  return loss_i

def L_i_vectorized(x, y, W):
  """
  A faster half-vectorized implementation. half-vectorized
  refers to the fact that for a single example the implementation contains
  no for loops, but there is still one loop over the examples (outside this function)
  """
  delta = 1.0
  scores = W.dot(x)
  # compute the margins for all classes in one vector operation
  margins = np.maximum(0, scores - scores[y] + delta)
  # on y-th position scores[y] - scores[y] canceled and gave delta. We want
  # to ignore the y-th position and only consider margin on max wrong class
  margins[y] = 0
  loss_i = np.sum(margins)
  return loss_i

def L(X, y, W):
  """
  fully-vectorized implementation :
  - X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)
  - y is array of integers specifying correct class (e.g. 50,000-D array)
  - W are weights (e.g. 10 x 3073)
  """
  # evaluate loss over all examples in X without using any for loops
  # left as exercise to reader in the assignment

Softmax classifier

softmax function f j ( z ) = e z j ∑ k e z k f_j(z) = \frac{e^{z_j}}{\sum_k e^{z_k}} fj(z)=kezkezj
L i = − log ⁡ ( e f y i ∑ j e f j ) or equivalently L i = − f y i + log ⁡ ∑ j e f j L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) \hspace{0.5in} \text{or equivalently} \hspace{0.5in} L_i = -f_{y_i} + \log\sum_j e^{f_j} Li=log(jefjefyi)or equivalentlyLi=fyi+logjefj
信息熵:
H ( p , q ) = − ∑ x p ( x ) log ⁡ q ( x ) H(p,q) = - \sum_x p(x) \log q(x) H(p,q)=xp(x)logq(x)
Code

f = np.array([123, 456, 789]) # example with 3 classes and each having large scores
p = np.exp(f) / np.sum(np.exp(f)) # Bad: Numeric problem, potential blowup

# instead: first shift the values of f so that the highest number is 0:
f -= np.max(f) # f becomes [-666, -333, 0]
p = np.exp(f) / np.sum(np.exp(f)) # safe to do, gives the correct answer

SVM vs Softmax

在这里插入图片描述

Interactive Web Demo of Linear Classification

在这里插入图片描述

Summary

  • 定义了一个score function,表示图像像素到分类器中的映射(本文中用了线性函数)
  • 与KNN分类不同,这种方法确定了参数后即可以丢弃训练集,对可以快速预测新图像。
  • 引入了线性分类器常用的两种损失函数:SVMSoftmax。损失函数越小,对训练的预测就越好。
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值