CS231n 02 Loss Functions and Optimization

RiKler

于 2020-11-10 09:02:46 发布

阅读量206

点赞数

分类专栏：机器学习神经网络

本文链接：https://blog.csdn.net/RiKler/article/details/109591630

版权

机器学习同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

神经网络

3 篇文章 0 订阅

订阅专栏

Loss Functions and Optimization

Preview the Goal in this lecture

Define a loss function
Come up with a way of finding the paras that minimize the (1)
(optimization)

The Remain Problem from last lecture

How to choose the W para ?

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ss70CYIY-1604970144386)(https://s1.ax1x.com/2020/11/08/BTZxgK.png)]

Loss function

A loss function tells how good our current classifier is.

${(x_i,y_i)}_{i=1}^N$

The $X_i$ is image and the $y_i$ is label (int)

The Total loss is defined as the func follows.

$\frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i)$
Which is the sum of every single test’s loss

Muticlass SVM loss

Given an example $x_i,y_i)$ where $x_i$ is the image and where $y_i$ is the (int) label, using the shorthand for the score vec $s = f(x_i,W)$

The SVM loss has the form:

if the incorrect score is smaller than the right score (x margin), we set the loss to 0.
in this case the safe margin is set to one
Margin choice depends on our need

Then we loop the class

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-I2VQ33KJ-1604970144389)(https://s1.ax1x.com/2020/11/08/BTZLNR.png)]

What if we use

$\frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i)^2$

This is not a linear function and totally different, it’s may be useful sometimes depends on the way you care about the errors.

Example Code

def L_i_vectorized(x, y, W):
    scores = W.dot(x)
    margins = np.maximun(0, scores - scores[y] + margin)
    margins[y] = 0
    loss_i = np.sum(margins)
    return loss_i
    # pretty easy

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JbB4CUcY-1604970144390)(https://s1.ax1x.com/2020/11/08/BTZO41.png)]

It just change the gap bettween scores

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BSwLNCok-1604970144392)(https://s1.ax1x.com/2020/11/08/BTZzjO.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7330C6pB-1604970144394)(https://s1.ax1x.com/2020/11/08/BTe9De.png)]

often use L2 regularization just Euclid norm.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fEo4KOUq-1604970144394)(https://s1.ax1x.com/2020/11/08/BTepuD.png)]

In this case the L1 and L2 reg is equal, but we can tell that L1 prefers the $w_1$ for it contains more zero, while the L2 prefers the $w_2$ for the weight is evenly spreaded through the test case.

The Multiclass SVM loss just care about the gap bettween the right labels and the wrongs.

Softmax Classifier

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qF5qD5Hi-1604970144395)(https://s1.ax1x.com/2020/11/08/BTeiEd.png)]

We just want to make the true probability closer to 1 (closer the better, eq is the best), so the loss func can be chosed by using the -log on the $P$ .

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IPQTMmLZ-1604970144396)(https://s1.ax1x.com/2020/11/08/BTeCHH.png)]

If we want to get the zero loss, the score may goes to inf! But Computer don’t like that.

Debugging Way
outcomes might be $l o g C$

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SSQzpARG-1604970144397)(https://s1.ax1x.com/2020/11/08/BTek4I.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WibZdaqJ-1604970144398)(https://s1.ax1x.com/2020/11/08/BTeECt.png)]

Optimization

Random Search - The Naive but Simplest way

Really Slow !!!

Gradient Descent

We just get the Gradient of W and go down to the bottom (maybe local best?)

Code

# Vanilla Gradient Descent

while True:
    weight_grad = evaluate_gradient(loss_fun, data, weights)
    weights += -step_size * weight_grad

Step size is called elearning rate which is important

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fbkOtSdW-1604970144403)(https://s1.ax1x.com/2020/11/08/BTeV8P.png)]

Since the N might be super large, we sample some sets called minibatch and use it to estimate the true gradient.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f7MlW5nk-1604970144404)(https://s1.ax1x.com/2020/11/08/BTeZgf.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IYgPftTP-1604970144405)(https://s1.ax1x.com/2020/11/08/BTenKS.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eqyviMgg-1604970144406)(https://s1.ax1x.com/2020/11/08/BTeuDg.png)]

Color Feature
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-imYM2LYI-1604970144408)(https://s1.ax1x.com/2020/11/08/BTeQEj.png)]

Gradient Extract the edge info
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vNHNimwU-1604970144409)(https://s1.ax1x.com/2020/11/08/BTelUs.png)]

NLP?
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-okmeYIzg-1604970144410)(https://s1.ax1x.com/2020/11/08/BTeG80.png)]

clustering different image patches from images

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ghKuObqE-1604970144410)(https://s1.ax1x.com/2020/11/08/BTe15n.png)]

Differences

Extract the Feature at first and feed into the linear classificator
Convolutional Neutral Network would learn the feature automatically during the training process.

RiKler

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS231n 02 Loss Functions and Optimization

Loss Functions and OptimizationPreview the Goal in this lectureDefine a loss functionCome up with a way of finding the paras that minimize the (1)(optimization)The Remain Problem from last lectureHow to choose the W para ?[外链图片转存失败,源站可能有防盗链机制,建
复制链接

扫一扫

专栏目录