机器学习（1.3）

最新推荐文章于 2024-05-02 21:46:15 发布

Dove_forehead

最新推荐文章于 2024-05-02 21:46:15 发布

阅读量193

点赞数

分类专栏：机器学习文章标签： 18-03-13

本文链接：https://blog.csdn.net/Dove_forehead/article/details/79545476

版权

机器学习专栏收录该内容

9 篇文章 0 订阅

订阅专栏

机器学习——参数学习

Machine Learning —— Parameter Learning

梯度下降算法，可以将代价函数最小化，可用于线性回归
梯度下降算法最小化代价函数J

梯度下降——定义

Gradient Descent——Definition

代价函数：

Have some function $J(\theta_0,\theta_1)$
Want $min_{\theta_0, \theta_1}$ J(\theta_0,\theta_1)

Outline 构想:

Start with $\theta_0,\theta_1$ ，一般情况下初始化 $\theta_0,\theta_1$ 都为0，即 $\theta_0 = 0, \theta_1 = 0$
Keep changing $\theta_0,\theta_1$ to reduce $J(\theta_0,\theta_1)$ until we hopefully end up at minimum.一直改变参数知道达到我们预期的最小值。

梯度下降算法 Gradient descent algorithm:

图例：
这里写图片描述

参数初始化的不同可能会导致局部最小值的不同

重复直至收敛 repeat until convergence{

θ j : = θ j - α \partial \partial θ j J (θ 0, θ 1) (f o r j = 0 a n d j = 1)

$\theta_j:=\theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta_0,\theta_1) \space \space \space \space (for\space j=0 \space and \space j=1)$ }
其中

:= := $:=$ 是赋值运算；

α α $\alpha$ 是学习速率（Learning rate）。 注：需要同时更新 ${\theta_0, \theta_1}$

正确的同时更新步骤为：

$temp0:= \theta_0 - \alpha \frac{\partial}{\partial \theta_0}J(\theta_0,\theta_1)$

$temp1:= \theta_1 - \alpha \frac{\partial}{\partial \theta_1}J(\theta_0,\theta_1)$

$\theta_0 := temp0$

$\theta_1 := temp1$

梯度下降原理

Gradient Descent Intution

Review：

repeat until convergence{

θ j : = θ j - α \partial \partial θ j J (θ 0, θ 1) (f o r j = 0 a n d j = 1)

$\theta_j:=\theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta_0,\theta_1) \space \space \space \space (for\space j=0 \space and \space j=1)$ }

简单实例

假设 $\theta_0 =0$ , 求代价函数 J() 的最小值

-偏导数变化

偏导数为正数时， $\theta_1$ 向左边移动变化
偏导数为负数时， $\theta_0$ 向左边移动变化

- $\alpha$ 变化

$\theta_1 := \theta_1 - \alpha \frac{\partial}{\partial \theta_1}J(\theta_1)$

$\alpha$ 过小，梯度下降会很缓慢
$\alpha$ 过大，梯度下降会跳过最小值。可能会收敛失败甚至发散（diverge）

线性回归中的梯度下降

Gradient Descent For Linear Regression

梯度下降不仅被用于线性回归，还被用于线性回归和代价函数相结合的情况

Review

Gradient descent algorithm:

repeat until convergence{

θ j : = θ j - α \partial \partial θ j J (θ 0, θ 1) (f o r j = 0 a n d j = 1)

$\theta_j:=\theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta_0,\theta_1) \space \space \space \space (for\space j=0 \space and \space j=1)$
}

Linear Regression Model:

h θ (x) = θ 0 + θ 1 x

$h_\theta(x)=\theta_0 + \theta_1x$

J (θ 0, θ 1) = 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$J(\theta_0, \theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$

Goal:

用梯度下降的算法最小化线性回归的代价函数

计算过程：

\partial \partial θ j J (θ 0, θ 1) = \partial \partial θ j \cdot 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$\frac{\partial}{\partial \theta_j}J(\theta_0,\theta_1) = \frac{\partial}{\partial \theta_j} \cdot \frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$

= \partial \partial θ j 1 2 m \sum i = 1 m (θ 0 + θ 1 x (i) - y (i)) 2

$= \frac{\partial}{\partial \theta_j}\frac{1}{2m}\sum_{i=1}^m(\theta_0 +\theta_1x^{(i)}- y^{(i)})^2$

$\theta_0$ 即 $j=0 : \frac{\partial}{\partial \theta_0}J(\theta_0,\theta_1) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})$

$\theta_1$ 即 $j=1 : \frac{\partial}{\partial \theta_1}J(\theta_0,\theta_1) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x^{(i)}$

所以，梯度下降在线性回归中的算法为：

repeat until convergence{

θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i))

$\theta_0 := \theta_0 - \alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})$

θ 1 : = θ 1 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) \cdot x (i)

$\theta_1 := \theta_1 - \alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x^{(i)}$

}

简化公式：

repeat until convergence{

θ 0 : = θ 0 - α \partial \partial θ 0 J (θ 0, θ 1)

$\theta_0 := \theta_0 - \alpha \frac{\partial}{\partial \theta_0}J(\theta_0,\theta_1)$

θ 1 : = θ 1 - α \partial \partial θ 1 J (θ 0, θ 1)

$\theta_1 := \theta_1 - \alpha \frac{\partial}{\partial \theta_1}J(\theta_0,\theta_1)$

}

and update $\theta_0$ and $\theta_1$ simultaneously

- 机器学习——参数学习

Dove_forehead

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习（1.3）

机器学习——参数学习Machine Learning —— Parameter Learning梯度下降算法，可以将代价函数最小化，可用于线性回归梯度下降算法最小化代价函数J梯度下降——定义Gradient Descent——Definition代价函数：Have some function J(θ0,θ1)J(θ0,θ1)J(\theta_0,\...
复制链接

扫一扫