ML-第三天

最新推荐文章于 2023-11-23 18:30:21 发布

robertoXChen

最新推荐文章于 2023-11-23 18:30:21 发布

阅读量225

点赞数 1

分类专栏： ML

本文链接：https://blog.csdn.net/robertoXChen/article/details/75068910

版权

ML 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Gradient Descent

梯度下降（Gradient Descent）对整个ML都是很重要的。梯度下降的作用是得出是代价函数的值最小的参数 $\ \theta_j$ .

REVIEW –about cost function:

The Cost function is like:

J (θ 0, θ 1, θ 2, . . ., θ n) = 1 2 m \sum i = 1 m (h θ (x (i) j) - y (i)) 2 (1)

$J(\theta_0,\theta_1,\theta_2,...,\theta_n)=\dfrac{1}{2m}\sum^m_{i=1}(h_\theta(x^{(i)}_j)-y^{(i)})^2 \tag{1}$
The variable

m $\ m$ is the number of training set. (Why times 1/2?->for the convenience of computing the Gradient Descent)

其中，

h θ (x (i)) = θ 0 + θ 1 x i 1 + θ 2 x i 2 + . . . + θ n x i n (2)

$h_\theta(x^{(i)})=\theta_0+\theta_1 x_1^{i}+\theta_2 x_2^{i}+...+\theta_n x_n^{i} \tag{2}$
（

i $\ i$ 表示第i组数据）。这是我们要求的最佳回归线。

关于梯度下降：

梯度下降用于不断更新参数 $\ \theta_0,\theta_1,...,\theta_n$ 来让代价函数最小，其方法是：

θ j : = θ j - α \partial \partial θ j J (θ 0, θ 1, . . ., θ n) (3)

$\theta_j :=\theta_j-\alpha \dfrac{\partial }{\partial \theta_j}J(\theta_0,\theta_1,...,\theta_n) \tag{3}$

( $\ \alpha$ ->learning rate)最终等于：

θ j : = θ j - α 1 m \sum i = 1 m (\sum j = 0 n h θ (x (i) j) - y (i)) x j (4)

$\theta_j :=\theta_j-\alpha \dfrac{1}{m}\sum^{m}_{i=1}(\sum_{j=0}^{n}h_\theta(x^{(i)}_j)-y^{(i)})x_j \tag{4}$
or:

θ j : = θ j - α 1 m \sum i = 1 m (X θ T) X = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ x 10 x 20 ⋮ x m, 0 x 11 x 22 ⋮ x m, 1 x 12 x 22 ⋮ x m, 2 \dots \dots ⋱ \dots x 1, n x 2 n ⋮ x m, n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ θ = [θ 0 θ 1 t h e t a 2 \dots θ n] (5)

$\theta_j:=\theta_j-\alpha \dfrac{1}{m}\sum_{i=1}^{m}(X \theta^T) \tag{5} \\ X= \left[ \begin{matrix} x_{10}&x_{11} & x_{12} & \cdots &x_{1,n}\\ x_{20}&x_{22}&x_{22}&\cdots&x_{2n}\\ \vdots &\vdots&\vdots&\ddots&\vdots\\ x_{m,0}&x_{m,1}&x_{m,2}&\cdots&x_{m,n} \end{matrix} \right] \\ \theta= \left[ \begin{matrix} \theta_0&\theta_1&theta_2\cdots&\theta_n \end{matrix} \right]$
其中，定义：

x(i)0=1 $\ x_0^{(i)}=1$ .

梯度下降的思路：

好比说一个二次函数：

y = a x 2 + b x + c (a! = 0)

$y=ax^2+bx+c \space\space\space\space(a!=0)$
先任意取一个值

x $\ x$ ，然后测它的导数，若大于零，那么要使

y $\ y$ 的值变小，必须要让

x $\ x$ 变小，若小于零，则是

x $\ x$ 变大。当

x $\ x$ 正好处于函数的对称位置（函数值最小），那么有导数等于零。

Learning Rate:

$\ \alpha$ 就是所说的 Learning Rate。Learning Rate 的用处，在于让ML更有效。当 $\ \alpha$ 过大，那么可能在某一步时越过最低点，甚至会让代价函数越变越大。在ML过程中，根据Cost Function 的变化来检测梯度下降是否正确，也根据此来更正 $\ \alpha$ 。

Features Scaling:

除了Learning Rate 外，不标准的参数也会导致ML的梯度下降过程出错。因此，有必要放缩各个变量数据，让它们的变量域相接近。基本保持在 $\ [-1,1]$ 之间。方法是：

x (i) j = x ( i ) j - x j ¯ R a n g e O f x j

$x_j^{(i)}=\dfrac{x_j^{(i)}-\bar{x_j}}{RangeOf\space x_j}$

另一种求参数 $\ \theta_0,\theta_1,\cdots\theta_n$ 的方法

简单说，就是：

θ = (X X T) - 1 X T y

$\theta=(XX^T)^{-1}X^Ty$
不知道为什么。

robertoXChen

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ML-第三天

Gradient Descent梯度下降（Gradient Descent）对整个ML都是很重要的。梯度下降的作用是得出是代价函数的值最小的参数 θj\ \theta_j.REVIEW –about cost function:The Cost function is like: J(θ0,θ1,θ2,...,θn)=12m∑i=1m(hθ(x(i)j)−y(i))2(1)J(\theta_0,
复制链接

扫一扫