# 1 Regression

## Step 1： Model (function set )

### A set of function

${f}_{1}:y=10.0+9.0\cdot {x}_{cp}$$f_1:y = 10.0+9.0\cdot x_{cp}$

${f}_{2}:y=9.8+9.2\cdot {x}_{cp}$$f_2:y = 9.8+9.2\cdot x_{cp}$

${f}_{3}:y=-0.8-1.2\cdot {x}_{cp}$$f_3:y = -0.8-1.2\cdot x_{cp}$

### Linear Model

xi ：输入值x的一个属性（feature 特征值）
wi ：weight ,b :bias

$y=b+\sum {w}_{i}{x}_{i}$$y = b + \sum w_ix_i$

## Step 2 : Goodness of function

$y=b+w\cdot {x}_{cp}$$y = b + w\cdot x_{cp}$

y hat 表示这是一个正确的数字

### Loss function

$L\left(f\right)=\sum _{n=1}^{10}\left({\stackrel{^}{y}}^{n}-f\left({x}_{cp}^{n}\right){\right)}^{2}$$L(f) = \sum_{n=1}^{10} (\hat{y}^n-f(x^n_ {cp}) )^2$

### Lost function 是 function 的 function

L(f) ——> L(w,b)

$L\left(f\right)=L\left(w,b\right)=\sum _{n=1}^{10}\left({\stackrel{^}{y}}^{n}-\left(b+w\cdot {x}_{cp}^{n}\right){\right)}^{2}$$L(f) = L(w,b)= \sum_{n=1}^{10} (\hat{y}^n- (b+w\cdot x^n_ {cp}) )^2$

## Step 3: Best function

### pick the Best function

${f}^{\ast }=\mathrm{arg}\underset{f}{min}L\left(f\right)$$f^*= {\arg \min_{f}}L(f)$

${w}^{\ast },{b}^{\ast }=\mathrm{arg}\underset{w,b}{min}L\left(w,b\right)$$w^*,b^*= {\arg \min_{w,b}}L(w,b)$

$=\mathrm{arg}\underset{w,b}{min}\sum _{n=1}^{10}\left({\stackrel{^}{y}}^{n}-\left(b+w\cdot {x}_{cp}^{n}\right){\right)}^{2}$$= {\arg \min_{w,b}}\sum_{n=1}^{10} (\hat{y}^n- (b+w\cdot x^n_ {cp}) )^2$

### 单个参数

${w}^{\ast }=\mathrm{arg}\underset{w}{min}L\left(w\right)$$w^*= {\arg \min_w}L(w)$

• pick an inital value w0
• Compute

$\frac{dL}{dW}{|}_{w={w}_{0}}$$\frac{dL}{dW}|_{w=w_0}$

${w}_{1}←{w}_{0}-\eta \frac{dL}{dW}{|}_{w={w}_{0}}$$w_1 \leftarrow w_0-\eta\frac{dL}{dW}|_{w=w_0}$

• Compute

$\frac{dL}{dW}{|}_{w={w}_{1}}$$\frac{dL}{dW}|_{w=w_1}$

${w}_{2}←{w}_{1}-\eta \frac{dL}{dW}{|}_{w={w}_{1}}$$w_2 \leftarrow w_1-\eta\frac{dL}{dW}|_{w=w_1}$

### 两个参数

${w}^{\ast },{b}^{\ast }=\mathrm{arg}\underset{w,b}{min}L\left(w,b\right)$$w^*,b^*= {\arg \min_{w,b}}L(w,b)$

• pick an inital value w0
• Compute (复习高等数学中如何求偏导)\

$\frac{\mathrm{\partial }L}{\mathrm{\partial }W}{|}_{w={w}_{0},b={b}_{0}},\frac{\mathrm{\partial }L}{\mathrm{\partial }b}{|}_{b={b}_{0},w={w}_{0}},$$\frac{\partial L}{\partial W}|_{w=w_0,b=b_0} , \frac{\partial L}{\partial b}|_{b=b_0,w=w_0} ,$

${w}_{1}←{w}_{0}-\eta \frac{\mathrm{\partial }L}{\mathrm{\partial }W}{|}_{w={w}_{0},b={b}_{0}}$$w_1 \leftarrow w_0-\eta\frac{\partial L}{\partial W}|_{w=w_0,b=b_0}$

${b}_{1}←{b}_{0}-\eta \frac{\mathrm{\partial }L}{\mathrm{\partial }b}{|}_{w={w}_{0},b={b}_{0}}$$b_1 \leftarrow b_0-\eta\frac{\partial L}{\partial b}|_{w=w_0,b=b_0}$

• Compute

$\frac{\mathrm{\partial }L}{\mathrm{\partial }W}{|}_{w={w}_{1},b={b}_{1}},\frac{\mathrm{\partial }L}{\mathrm{\partial }b}{|}_{b={b}_{1},w={w}_{1}},$$\frac{\partial L}{\partial W}|_{w=w_1,b=b_1} , \frac{\partial L}{\partial b}|_{b=b_1,w=w_1} ,$

${w}_{2}←{w}_{1}-\eta \frac{\mathrm{\partial }L}{\mathrm{\partial }W}{|}_{b={b}_{1},w={w}_{1}}$$w_2 \leftarrow w_1-\eta\frac{\partial L}{\partial W}|_{b=b_1,w=w_1}$

${b}_{2}←{b}_{1}-\eta \frac{\mathrm{\partial }L}{\mathrm{\partial }b}{|}_{b={b}_{1},w={w}_{1}}$$b_2 \leftarrow b_1-\eta\frac{\partial L}{\partial b}|_{b=b_1,w=w_1}$

### Problem

globel minima
stuck at local minima
very slow at the plateau

### Learning Rate

$\eta$$\eta$
Learning Rate 控制步子大小、学习速度。

### another linear model

$y=b+{W}_{1}\cdot {X}_{cp}+{W}_{2}\cdot \left({X}_{cp}{\right)}^{2}$$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2$

$y=b+{W}_{1}\cdot {X}_{cp}+{W}_{2}\cdot \left({X}_{cp}{\right)}^{2}+{W}_{3}\cdot \left({X}_{cp}{\right)}^{3}$$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2+W_3\cdot( X_{cp})^3$

$y=b+{W}_{1}\cdot {X}_{cp}+{W}_{2}\cdot \left({X}_{cp}{\right)}^{2}+{W}_{3}\cdot \left({X}_{cp}{\right)}^{3}+{W}_{4}\cdot \left({X}_{cp}{\right)}^{4}$$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2+W_3\cdot( X_{cp})^3+W_4\cdot( X_{cp})^4$

$y=b+{W}_{1}\cdot {X}_{cp}+{W}_{2}\cdot \left({X}_{cp}{\right)}^{2}+{W}_{3}\cdot \left({X}_{cp}{\right)}^{3}+{W}_{4}\cdot \left({X}_{cp}{\right)}^{4}+{W}_{5}\cdot \left({X}_{cp}{\right)}^{5}$$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2+W_3\cdot( X_{cp})^3+W_4\cdot( X_{cp})^4+W_5\cdot( X_{cp})^5$

- A more complex model yields lower error on training data.
If we can truly find the best function

### Model Selection

model Training Testing
1 31.9 35.0
2 15.4 18.4
3 15.3 18.1
4 14.9 28.2
5 12.8 232.1

- A more complex model does not always lead to better performance on testing data.
- This is Overfitting

## Back to step 1: Redesign the Model

$if{x}_{s}=Pidgey:y={b}_{1}+{w}_{1}\cdot {x}_{cp}$$if x_s=Pidgey: y = b_1+w_1\cdot x_{cp}$

$if{x}_{s}=Weedle:y={b}_{2}+{w}_{2}\cdot {x}_{cp}$$if x_s=Weedle: y = b_2+w_2\cdot x_{cp}$

$if{x}_{s}=Caterpie:y={b}_{3}+{w}_{3}\cdot {x}_{cp}$$if x_s=Caterpie: y = b_3+w_3\cdot x_{cp}$

$if{x}_{s}=Eevee:y={b}_{4}+{w}_{4}\cdot {x}_{cp}$$if x_s=Eevee: y = b_4+w_4\cdot x_{cp}$

$↓$$\downarrow$

$y={b}_{1}\cdot \delta \left({x}_{s}=Pidgey\right)+{w}_{1}\cdot \delta \left({x}_{s}=Pidgey\right)\cdot {x}_{cp}$$y = b_1 \cdot \delta(x_s=Pidgey) +w_1\cdot\delta(x_s=Pidgey)\cdot x_{cp}$

$+{b}_{2}\cdot \delta \left({x}_{s}=Weedle\right)+{w}_{2}\cdot \delta \left({x}_{s}=Weedle\right)\cdot {x}_{cp}$$+b_2 \cdot \delta(x_s=Weedle) +w_2\cdot\delta(x_s=Weedle)\cdot x_{cp}$

$+{b}_{3}\cdot \delta \left({x}_{s}=Caterpie\right)+{w}_{3}\cdot \delta \left({x}_{s}=Pidgey\right)\cdot {x}_{cp}$$+b_3 \cdot \delta(x_s=Caterpie) +w_3\cdot\delta(x_s=Pidgey)\cdot x_{cp}$

$+{b}_{4}\cdot \delta \left({x}_{s}=Caterpie\right)+{w}_{4}\cdot \delta \left({x}_{s}=Pidgey\right)\cdot {x}_{cp}$$+b_4 \cdot \delta(x_s=Caterpie) +w_4\cdot\delta(x_s=Pidgey)\cdot x_{cp}$

## Are there any other hidden factors?

hp值，体重，高度对cp值的影响。

## Back to step 1: Redesign the Model Again

$if{x}_{s}=Pidgey:{y}^{,}={b}_{1}+{w}_{1}\cdot {x}_{cp}+{w}_{5}\cdot \left({x}_{cp}{\right)}^{2}$$if x_s=Pidgey: y^, = b_1+w_1\cdot x_{cp} + w_5\cdot (x_{cp})^2$

$if{x}_{s}=Weedle:{y}^{,}={b}_{2}+{w}_{2}\cdot {x}_{cp}+{w}_{6}\cdot \left({x}_{cp}{\right)}^{2}$$if x_s=Weedle: y^, = b_2+w_2\cdot x_{cp}+ w_6\cdot (x_{cp})^2$

$if{x}_{s}=Caterpie:{y}^{,}={b}_{3}+{w}_{3}\cdot {x}_{cp}+{w}_{7}\cdot \left({x}_{cp}{\right)}^{2}$$if x_s=Caterpie: y^,= b_3+w_3\cdot x_{cp}+ w_7\cdot (x_{cp})^2$

$if{x}_{s}=Eevee:{y}^{,}={b}_{4}+{w}_{4}\cdot {x}_{cp}+{w}_{8}\cdot \left({x}_{cp}{\right)}^{2}$$if x_s=Eevee: y^, = b_4+w_4\cdot x_{cp}+ w_8\cdot (x_{cp})^2$

$↓$$\downarrow$

$y={y}^{,}+{w}_{9}\cdot {x}_{np}+{w}_{1}0\cdot \left({x}_{np}{\right)}^{2}+{w}_{1}\cdot$$y = y^,+w_9\cdot x_{np}+w_10\cdot(x_{np})^2+w_1\cdot$ ${x}_{h}+{w}_{1}2\cdot \left({x}_{h}{\right)}^{2}+{w}_{1}3\cdot {x}_{w}+{w}_{1}4\cdot \left({x}_{w}{\right)}^{2}$$x_{h}+w_12\cdot(x_{h})^2+w_13\cdot x_{w}+w_14\cdot(x_{w})^2$

## Back to step 2：regularization

### 对很多不同的test 都general有用的方法：正则化。

$L\left(f\right)=L\left(w,b\right)$$L(f) = L(w,b)$
$=\sum _{n}\left({\stackrel{^}{y}}^{n}-\left(b+\sum {w}_{i}\cdot {x}_{i}\right){\right)}^{2}+\lambda \sum \left({w}_{i}{\right)}^{2}$$= \sum_{n}(\hat y^n -(b+\sum{w_i\cdot x_i}))^2 + \lambda \sum(w_i)^2$

$y=b+\sum {w}_{i}{x}_{i}$$y = b + \sum w_ix_i$

$y+\sum {w}_{i}\mathrm{\Delta }{x}_{i}=b+\sum {w}_{i}\left({x}_{i}+\mathrm{\Delta }{x}_{i}\right)$$y + \sum w_i\Delta x_i = b + \sum w_i(x_i+\Delta x_i)$

lambda Training Testing
0 1.9 102.3
1 2.3 68.7
10 3.5 25.7
100 4.1 11.1
1000 5.6 12.8
10000 6.3 18.7
100000 8.5 26.8

lambda 增加的时候，我们是会找到一个比较smooth的function。越大的λ，对training error考虑得越少。 调整λ，选择使testing error最小的λ。

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120