regression

ZZZZ_Y_

已于 2022-10-13 10:15:12 修改

阅读量338

点赞数

分类专栏：李宏毅机器学习笔记文章标签：机器学习

于 2022-10-11 15:31:24 首次发布

本文链接：https://blog.csdn.net/ZZZZ_Y_/article/details/127257703

版权

李宏毅机器学习笔记专栏收录该内容

5 篇文章 1 订阅

订阅专栏

regression: output a scalar

example:

Stock Market Forecast——output: stock price tomorrow

Self-driving Car——input : 各个sensor的信息；output: 方向盘角度

Recommendation——output: 购买可能性

Estimating the Combat Power(CP) of a pokemon after evolution——

step1:Model

choose a model: y = w*xcp + b;

linear model: $y = b + \sum w_{_i}x_{_i}$ ； $x_{_i}: x_{_cp},x_{_hp},x_{_w},x_{_h}...$ （feature）; wi：weight；b：bias

step2:Goodness of Function

Loss function L(input: a function, output: how bad is it)

$L(f) = \sum_{n=1}^{m}(\hat{y}{^{n}} - f(x^{n}_{cp}))^{2}$ 其中f(x)为预测结果

$L(w,b) = \sum_{n=1}^{m}(\hat{y}{^{n}} -(b+w\cdot x^{n}_{cp}))^{2}$

step3: Best function

$f^{*} = arg \, \underset{f}{min} \, L(f)$

$w^{*},b^{*} = arg \, \underset{w,b}{min} \, L(f) = arg \,\underset{w,b}{min}\sum_{n=1}^{m}(\hat{y}{^{n}} -(b+w\cdot x^{n}_{cp}))^{2}$

gradient descent 梯度下降（求解最优化问题）

1.随机找一个初始位置w0 Randomly pick an initial value w;

2.计算w0处损失函数的微分（切线斜率）Compute $\frac{dL}{dw}|_{w = w^{0}}$

3.调整移动w使loss减小： $w^{1} \, \leftarrow \, w^{0} - \eta \frac{dL}{dw}|_{w = w^{0}}$ learning rate η大学习快，反之学习慢

若为负 negative -> Increase w

若为正 positive -> Decrease w

若为0 卡住了，停在local minima

stuck at saddle point（鞍点） / local minima（局部极小值）

linear regression问题的损失函数一般为凸函数，无saddle point 和local minima

Formulation of $\partial L / \partial w$ and $\partial L/ \partial \,b$

$L(w,b) = \sum_{n=1}^{m}(\hat{y}{^{n}} -(b+w\cdot x^{n}_{cp}))^{2}$

$\frac{\partial L}{\partial w} = \sum_{n = 1}^{m} 2(\hat{y} - (b \, + \,w \cdot x_{cp}^{n} ))(-x_{cp}^{n})$

$\frac{\partial L}{\partial b} = \sum_{n = 1}^{m} 2(\hat{y} - (b \, + \,w \cdot x_{cp}^{n} ))(-1)$

model seclection

模型并非越复杂越好，虽然越复杂训练集的损失会越小，但会出现overfiting（过拟合）的问题，test的损失会很大，没有意义。

redesign the model

增加种类特征，仍然是linear model

$y = b_{1} \cdot \delta(x_{s} = Pidgey) \, + \,w_{1} \cdot \delta(x_{s} \, = \, Pidgey)x_{cp} \, + \, b_{2} \cdot \delta(x_{s} = Weedle) \, + \,w_{2} \cdot \delta(x_{s} \, = \, Weedle)x_{cp}$

$\delta(x_{s} \, = \, Pidgey)=\left\{\begin{matrix} 1 & if \,\, x_{s} = Pidgey\\ 0& otherwise \end{matrix}\right.$

$if\,\, x_{s} = Pidgey,\,\,then\,y = b_{1}\,+\,w_{1}\cdot x_{cp}$

other features?

看上去其他因素对 CP after evolution 的影响不大

考虑用多项式模型进行拟合，发现模型越复杂training loss越小，但test loss反而可能会增大

regularization（正则化）

$y=b\, +\, \sum w_{i}x_{i}$

$L=\sum_{n}(\hat{y}^{n}-(b\,+\,\sum w_{i}x_{i}))^{2}\,+\,\lambda \sum(w_{i})^{2}$

smaller $w_{i}$ means smoother function（更平滑：输入变化小则输出变化小)，so the functions with smaller $w_{i}$ are better. We believe smoother function is more likely to be correct.