梯度下降(Gradient Descent)数学推导，多变量

最新推荐文章于 2021-07-09 10:56:27 发布

dycdyccc

最新推荐文章于 2021-07-09 10:56:27 发布

阅读量343

点赞数

分类专栏： AI

AI 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

下标及其表示

Notation	Size $x_{1}$	Number of bed rooms $x_{2}$	Number of floors $x_{3}$	Years $x_{4}$	Price $y$
$x^{(1)} =1^{th}$ $t r a i n i n g$ $e x a m p l e$	2104	5	1	10	460
$x^{(2)} =2^{nd}$ $t r a i n i n g$ $e x a m p l e$	1416	3（ $x^{(2)}_{2}$ ）	2	8	232
$x^{(3)} =3^{rd}$ $t r a i n i n g$ $e x a m p l e$	1534	3	2	5	315
$\cdots$	$\cdots$	$\cdots$	$\cdots$	$\cdots$	$\cdots$

$n$ = number of features = 4
$x^{(i)}$ = input of $i^{th}$ trainning example，第 $i$ 个训练数据， $4\times1$ 向量，定义成列向量
$x^{(2)} = \left( \begin{matrix} 1416 \\ 3 \\ 2 \\ 8 \\ \end{matrix} \right)$
$x^{(i)}_{j}$ = value of feature $j$ in $i^{th}$ trainning example, 标量

多变量表示

$h_{\theta}(x)(假设) = \theta_0 +\theta_1x_1 + \theta_2x_2 + \cdots$
定义： $x_0 = 1$
$\left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right)$
$\theta = \left( \begin{matrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n\\ \end{matrix} \right)$
$h_{\theta}(x) = \theta^T\cdot x = \left( \begin{matrix} \theta_0 & \theta_1 & \cdots & \theta_n \end{matrix} \right) \cdot \left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right)$

梯度下降

Hypothesis: $h_{\theta}(x) = \theta^T\cdot x =\theta_0x_0 +\theta_1x_1 + \theta_2x_2 + \cdots$
Parameters: $\theta$ which is a $\times 1$ vector
Cost function: $J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})^2}$
Gradient Descent:

Repeat:{
$\theta_j:= \theta_j - \alpha\frac{\partial}{\partial \theta_j}J(\theta)$
}
So,
for j=0:
$\frac{\partial}{\partial \theta_0}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_0}$
$=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_0}$
$=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_0^{(i)}$
$=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}$
for j=1:
$\frac{\partial}{\partial \theta_1}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_1}$
$=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_1}$
$=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_1^{(i)}$