regression

regression: output a scalar

example:

Stock Market Forecast——output: stock price tomorrow

Self-driving Car——input : 各个sensor的信息;output: 方向盘角度

Recommendation——output: 购买可能性

Estimating the Combat Power(CP) of a pokemon after evolution——

step1:Model

choose a model: y = w*xcp + b;

linear model: y = b + \sum w_{_i}x_{_i}   ;   x_{_i}: x_{_cp},x_{_hp},x_{_w},x_{_h}...(feature);  wi:weight;b:bias

step2:Goodness of Function

Loss function L(input: a function, output: how bad is it)

 L(f) = \sum_{n=1}^{m}(\hat{y}{^{n}} - f(x^{n}_{cp}))^{2}     其中f(x)为预测结果

L(w,b) = \sum_{n=1}^{m}(\hat{y}{^{n}} -(b+w\cdot x^{n}_{cp}))^{2}

step3: Best function

f^{*} = arg \, \underset{f}{min} \, L(f)

w^{*},b^{*} = arg \, \underset{w,b}{min} \, L(f) = arg \,\underset{w,b}{min}\sum_{n=1}^{m}(\hat{y}{^{n}} -(b+w\cdot x^{n}_{cp}))^{2}

gradient descent 梯度下降  (求解最优化问题)

1.随机找一个初始位置w0 Randomly pick an initial value w;

2.计算w0处损失函数的微分(切线斜率)Compute \frac{dL}{dw}|_{w = w^{0}}

3.调整移动w使loss减小:w^{1} \, \leftarrow \, w^{0} - \eta \frac{dL}{dw}|_{w = w^{0}}   learning rate η大学习快,反之学习慢

        若为负  negative -> Increase w

        若为正  positive -> Decrease w

        若为0  卡住了,停在local minima

stuck at saddle point(鞍点) / local minima(局部极小值)

linear regression问题的损失函数一般为凸函数,无saddle point 和local minima

Formulation of \partial L / \partial w and \partial L/ \partial \,b

L(w,b) = \sum_{n=1}^{m}(\hat{y}{^{n}} -(b+w\cdot x^{n}_{cp}))^{2}

\frac{\partial L}{\partial w} = \sum_{n = 1}^{m} 2(\hat{y} - (b \, + \,w \cdot x_{cp}^{n} ))(-x_{cp}^{n})

\frac{\partial L}{\partial b} = \sum_{n = 1}^{m} 2(\hat{y} - (b \, + \,w \cdot x_{cp}^{n} ))(-1)

model seclection

模型并非越复杂越好,虽然越复杂训练集的损失会越小,但会出现overfiting(过拟合)的问题,test的损失会很大,没有意义。

redesign the model

增加种类特征,仍然是linear model

y = b_{1} \cdot \delta(x_{s} = Pidgey) \, + \,w_{1} \cdot \delta(x_{s} \, = \, Pidgey)x_{cp} \, + \, b_{2} \cdot \delta(x_{s} = Weedle) \, + \,w_{2} \cdot \delta(x_{s} \, = \, Weedle)x_{cp}

\delta(x_{s} \, = \, Pidgey)=\left\{\begin{matrix} 1 & if \,\, x_{s} = Pidgey\\ 0& otherwise \end{matrix}\right.

if\,\, x_{s} = Pidgey,\,\,then\,y = b_{1}\,+\,w_{1}\cdot x_{cp}

other features?

看上去其他因素对 CP after evolution 的影响不大

考虑用多项式模型进行拟合,发现模型越复杂training loss越小,但test loss反而可能会增大

regularization(正则化)

y=b\, +\, \sum w_{i}x_{i}

L=\sum_{n}(\hat{y}^{n}-(b\,+\,\sum w_{i}x_{i}))^{2}\,+\,\lambda \sum(w_{i})^{2}

smaller w_{i} means smoother function(更平滑:输入变化小则输出变化小),so the functions with smaller w_{i} are better. We believe smoother function is more likely to be correct.

No need to apply regularization on bias——bias 不影响曲线的平滑程度

λ是需要自己调的,λ越大,function 越平滑,error可能会变大

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值