CS231n 02 Loss Functions and Optimization

Loss Functions and Optimization


Preview the Goal in this lecture

  1. Define a loss function
  2. Come up with a way of finding the paras that minimize the (1)
    (optimization)

The Remain Problem from last lecture

  • How to choose the W para ?

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ss70CYIY-1604970144386)(https://s1.ax1x.com/2020/11/08/BTZxgK.png)]

Loss function

A loss function tells how good our current classifier is.

( x i , y i ) i = 1 N {(x_i,y_i)}_{i=1}^N (xi,yi)i=1N

The X i X_i Xi is image and the y i y_i yi is label (int)

The Total loss is defined as the func follows.

L = 1 N ∑ i L i ( f ( x i , W ) , y i ) L = \frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i) L=N1iLi(f(xi,W),yi)
Which is the sum of every single test’s loss


Muticlass SVM loss

Given an example ( x i , y i ) (x_i,y_i) (xi,yi) where x i x_i xi is the image and where y i y_i yi is the (int) label, using the shorthand for the score vec s = f ( x i , W ) s = f(x_i,W) s=f(xi,W)

The SVM loss has the form:

if the incorrect score is smaller than the right score (x margin), we set the loss to 0.
in this case the safe margin is set to one
Margin choice depends on our need

  • Then we loop the class

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-I2VQ33KJ-1604970144389)(https://s1.ax1x.com/2020/11/08/BTZLNR.png)]

  • What if we use

L = 1 N ∑ i L i ( f ( x i , W ) , y i ) 2 L = \frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i)^2 L=N1iLi(f(xi,W),yi)2

This is not a linear function and totally different, it’s may be useful sometimes depends on the way you care about the errors.

Example Code
def L_i_vectorized(x, y, W):
    scores = W.dot(x)
    margins = np.maximun(0, scores - scores[y] + margin)
    margins[y] = 0
    loss_i = np.sum(margins)
    return loss_i
    # pretty easy

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JbB4CUcY-1604970144390)(https://s1.ax1x.com/2020/11/08/BTZO41.png)]

It just change the gap bettween scores

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BSwLNCok-1604970144392)(https://s1.ax1x.com/2020/11/08/BTZzjO.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7330C6pB-1604970144394)(https://s1.ax1x.com/2020/11/08/BTe9De.png)]

often use L2 regularization just Euclid norm.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fEo4KOUq-1604970144394)(https://s1.ax1x.com/2020/11/08/BTepuD.png)]

In this case the L1 and L2 reg is equal, but we can tell that L1 prefers the w 1 w_1 w1 for it contains more zero, while the L2 prefers the w 2 w_2 w2 for the weight is evenly spreaded through the test case.

The Multiclass SVM loss just care about the gap bettween the right labels and the wrongs.

Softmax Classifier

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qF5qD5Hi-1604970144395)(https://s1.ax1x.com/2020/11/08/BTeiEd.png)]

We just want to make the true probability closer to 1 (closer the better, eq is the best), so the loss func can be chosed by using the -log on the P P P.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IPQTMmLZ-1604970144396)(https://s1.ax1x.com/2020/11/08/BTeCHH.png)]

If we want to get the zero loss, the score may goes to inf! But Computer don’t like that.

  • Debugging Way
    outcomes might be l o g C logC logC

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SSQzpARG-1604970144397)(https://s1.ax1x.com/2020/11/08/BTek4I.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WibZdaqJ-1604970144398)(https://s1.ax1x.com/2020/11/08/BTeECt.png)]


Optimization

Random Search - The Naive but Simplest way

Really Slow !!!

Gradient Descent

We just get the Gradient of W and go down to the bottom (maybe local best?)

Code

# Vanilla Gradient Descent

while True:
    weight_grad = evaluate_gradient(loss_fun, data, weights)
    weights += -step_size * weight_grad

Step size is called elearning rate which is important

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fbkOtSdW-1604970144403)(https://s1.ax1x.com/2020/11/08/BTeV8P.png)]

Since the N might be super large, we sample some sets called minibatch and use it to estimate the true gradient.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f7MlW5nk-1604970144404)(https://s1.ax1x.com/2020/11/08/BTeZgf.png)]


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IYgPftTP-1604970144405)(https://s1.ax1x.com/2020/11/08/BTenKS.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eqyviMgg-1604970144406)(https://s1.ax1x.com/2020/11/08/BTeuDg.png)]

Color Feature
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-imYM2LYI-1604970144408)(https://s1.ax1x.com/2020/11/08/BTeQEj.png)]

Gradient Extract the edge info
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vNHNimwU-1604970144409)(https://s1.ax1x.com/2020/11/08/BTelUs.png)]

NLP?
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-okmeYIzg-1604970144410)(https://s1.ax1x.com/2020/11/08/BTeG80.png)]

clustering different image patches from images

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ghKuObqE-1604970144410)(https://s1.ax1x.com/2020/11/08/BTe15n.png)]

  • Differences
  1. Extract the Feature at first and feed into the linear classificator
  2. Convolutional Neutral Network would learn the feature automatically during the training process.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值