02/17/2020 Stanford- CS231-note Loss functions and optimization

  • a loss function tells how good our current classifier is
    You tell your algorithm what kind of errors you care about and what kind of errors you trade off against

  • Multi-class SVM loss
    -在这里插入图片描述- j could be the number of classes our dataset have
    -syi - the score of the true class/ s- predicted scores come out from prediction
    -if true score is not high enough to be greater than any of the other scores, incur some loss
    -why 1 here? we only care about the relative differences between the scores,you will find 1 doesn’t matter if you rescale w, the free parameter of 1 washes out and is canceled with this overall scale in w
    -hinge loss (according to shape)
    在这里插入图片描述

  • ex (include all bad predictions)
    在这里插入图片描述
    在这里插入图片描述

  • Q: at initialization W is small so all s=.0, what is the loss?
    A: number of classes minus one (useful for debug)
    what if the sum was over all classes?

  • 🌟🌟🌟
    在这里插入图片描述
    lambda: regularization hyper-parameter is what we need to tune when training
    penalize the complexity of the model/ the complexity count on your decision (L1 cares about 0

  • Regularization
    在这里插入图片描述
    在这里插入图片描述
    L2 will prefer w1 because it has a smaller norm/ like spread across all the values
    for L1, w1=w2/ L1 prefers sparse solutions, let many of elements to 0

  • softmax classifier (multinomial logistic regression)
    -why log, we hope our probabilty reach to 1
    -our loss is this minus log of probabilty of the true class
    在这里插入图片描述
    ex. 在这里插入图片描述

  • Q: at initialization W is small so all s=.0, what is the loss?
    lg©

  • Opitimization
    how to find the bottom of valley

  • bad idea: random search, only 15% accuracy

  • follow the slope- derivative of a function( for scalar)
    -in multiple dimensions, for a vector of partial derivatives- greadient
    the slope in any direction is the dot product of the direction with the gradient (the direction of steepest descent is the negative gradient)
    -numerical gradient: slow, easy to write, approximate
    在这里插入图片描述
    -analytic gradient: exact, fast, error-prone
    calculate dw
    在这里插入图片描述
    gradient check: debugging tool (unique)

  • step_size = learning rate (first thing that tries to set)
    在这里插入图片描述
    minibatch:(update w) sample some random minibatch of data

  • image features
    -take your image and compute various feature representations- then concatenate these different feature vectors to give some fature representations of the image- feed them into a linear classifier
    -motivation
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值