深度学习与PyTorch笔记17

最新推荐文章于 2024-09-14 19:55:48 发布

niuniu990

最新推荐文章于 2024-09-14 19:55:48 发布

阅读量161

点赞数

文章标签： python pytorch

本文链接：https://blog.csdn.net/niuniu990/article/details/88529384

版权

Logistic Regression

regression一般指连续的。
interpret network as $f:x\to p(y|x;\theta)$ 其中， $y = 1$ ， $\theta=[w,b]$
使用sigmoid函数
output $\in[0,1]$
which is exactly what logistic function comes in!

For regression：
Goal： $p r e d = y$
Approach：minimize $d i s t (p r e d, y)$ 最小距离，使用一范数或者二范数的平方。

For classification：
Goal：maximize benchmark，e.g.accuracy
$p_{\theta}(y|x)$ 为 $\theta$ 参数上的分布， $p_{r}(y|x))$ 为真实的分布，希望两个数越近越好。
Approach1：minimize $dist(C,p_{r}(y|x))$
Approach2：minimize $divergence(p_{\theta}(y|x),p_{r}(y|x))$

Q1.why not maximize accuracy?

为什么classification中train的目标和最终test的目标不一样？也就是为什么不能直接maximize accuracy?
$acc.=\frac{\sum I(pred_{i}==y_{i})}{len(Y)}$
直接的maximize accuracy会有两个问题：
issues 1.gradient=0 if accuracy unchanged but weights changed
结果是非0即1的，概率大于0.5为1，小于0.5为0，若一个本该划为1的情况，w在计算中为0.4，w可能会发生由0.4到0.45的改变而并没有改变最终结果，结果仍为0，分类错误。
issues 2.gradient not continuous since the number of correct is not continuous
也有可能会出现从0.499到0.501的改变，w只改变了0.002，但结果发生了很大的变化，变得不连续。