# 机器学习（1.3）

## 机器学习——参数学习

### 梯度下降——定义

Have some function $J\left({\theta }_{0},{\theta }_{1}\right)$$J(\theta_0,\theta_1)$
Want $mi{n}_{{\theta }_{0},{\theta }_{1}}$$min_{\theta_0, \theta_1}$J(\theta_0,\theta_1)

Outline 构想:

• Start with ${\theta }_{0},{\theta }_{1}$$\theta_0,\theta_1$，一般情况下初始化 ${\theta }_{0},{\theta }_{1}$$\theta_0,\theta_1$ 都为0，即 ${\theta }_{0}=0,{\theta }_{1}=0$$\theta_0 = 0, \theta_1 = 0$

• Keep changing ${\theta }_{0},{\theta }_{1}$$\theta_0,\theta_1$ to reduce $J\left({\theta }_{0},{\theta }_{1}\right)$$J(\theta_0,\theta_1)$ until we hopefully end up at minimum.一直改变参数知道达到我们预期的最小值。

}

$temp0:={\theta }_{0}-\alpha \frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{0}}J\left({\theta }_{0},{\theta }_{1}\right)$$temp0:= \theta_0 - \alpha \frac{\partial}{\partial \theta_0}J(\theta_0,\theta_1)$

$temp1:={\theta }_{1}-\alpha \frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{1}}J\left({\theta }_{0},{\theta }_{1}\right)$$temp1:= \theta_1 - \alpha \frac{\partial}{\partial \theta_1}J(\theta_0,\theta_1)$

${\theta }_{0}:=temp0$$\theta_0 := temp0$

${\theta }_{1}:=temp1$$\theta_1 := temp1$

### 梯度下降 原理

Review：

repeat until convergence{

}

-偏导数变化

• 偏导数为正数时，${\theta }_{1}$$\theta_1$向左边移动变化

• 偏导数为负数时，${\theta }_{0}$$\theta_0$向左边移动变化

-$\alpha$$\alpha$变化

${\theta }_{1}:={\theta }_{1}-\alpha \frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{1}}J\left({\theta }_{1}\right)$$\theta_1 := \theta_1 - \alpha \frac{\partial}{\partial \theta_1}J(\theta_1)$

• $\alpha$$\alpha$ 过小，梯度下降会很缓慢

• $\alpha$$\alpha$ 过大，梯度下降会跳过最小值。可能会收敛失败甚至发散（diverge）

### 线性回归中的梯度下降

#### Gradient Descent For Linear Regression

Review

repeat until convergence{

}

Linear Regression Model:

${h}_{\theta }\left(x\right)={\theta }_{0}+{\theta }_{1}x$

$J\left({\theta }_{0},{\theta }_{1}\right)=\frac{1}{2m}\sum _{i=1}^{m}\left({h}_{\theta }\left({x}^{\left(i\right)}\right)-{y}^{\left(i\right)}{\right)}^{2}$

Goal:

$\frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{j}}J\left({\theta }_{0},{\theta }_{1}\right)=\frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{j}}\cdot \frac{1}{2m}\sum _{i=1}^{m}\left({h}_{\theta }\left({x}^{\left(i\right)}\right)-{y}^{\left(i\right)}{\right)}^{2}$

$=\frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{j}}\frac{1}{2m}\sum _{i=1}^{m}\left({\theta }_{0}+{\theta }_{1}{x}^{\left(i\right)}-{y}^{\left(i\right)}{\right)}^{2}$

${\theta }_{0}$$\theta_0$$j=0:\frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{0}}J\left({\theta }_{0},{\theta }_{1}\right)=\frac{1}{m}\sum _{i=1}^{m}\left({h}_{\theta }\left({x}^{\left(i\right)}\right)-{y}^{\left(i\right)}\right)$$j=0 : \frac{\partial}{\partial \theta_0}J(\theta_0,\theta_1) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})$

${\theta }_{1}$$\theta_1$$j=1:\frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{1}}J\left({\theta }_{0},{\theta }_{1}\right)=\frac{1}{m}\sum _{i=1}^{m}\left({h}_{\theta }\left({x}^{\left(i\right)}\right)-{y}^{\left(i\right)}\right)\cdot {x}^{\left(i\right)}$$j=1 : \frac{\partial}{\partial \theta_1}J(\theta_0,\theta_1) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x^{(i)}$

repeat until convergence{

${\theta }_{0}:={\theta }_{0}-\alpha \frac{1}{m}\sum _{i=1}^{m}\left({h}_{\theta }\left({x}^{\left(i\right)}\right)-{y}^{\left(i\right)}\right)$

${\theta }_{1}:={\theta }_{1}-\alpha \frac{1}{m}\sum _{i=1}^{m}\left({h}_{\theta }\left({x}^{\left(i\right)}\right)-{y}^{\left(i\right)}\right)\cdot {x}^{\left(i\right)}$

}

repeat until convergence{

${\theta }_{0}:={\theta }_{0}-\alpha \frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{0}}J\left({\theta }_{0},{\theta }_{1}\right)$

${\theta }_{1}:={\theta }_{1}-\alpha \frac{\mathrm{\partial }}{\mathrm{\partial }{\theta }_{1}}J\left({\theta }_{0},{\theta }_{1}\right)$

}

and update ${\theta }_{0}$$\theta_0$ and ${\theta }_{1}$$\theta_1$ simultaneously