Cost Function and Backpropagation

最新推荐文章于 2024-08-08 21:12:55 发布

NZOGGY_

最新推荐文章于 2024-08-08 21:12:55 发布

阅读量146

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/NZOGGY_/article/details/123091129

版权

Cost Function and Backpropagation

Cost Function

Neural Network(Classification)

有m组训练集{ $x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)})$ }

$L$ = total number of layers in network

$s_l$ = number of units (not counting bias unit) in layer $l$

$\underline{Binary \;classification}$

y=0 or 1

1 output unit

$\underline{Multi-class\; classification}$ (K classes)

$y\in {\R}^{K}$

K output units

Cost function

Logistic regression:

$J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(i)}log(h_θ(x^{(i)}))+(1−y^{(i)}) log(1−h_θ(x^{(i)}))]+\frac{\lambda}{2m}\sum^n_{j=1}{\theta^2_j}$

Neural network:

$h_\Theta(x)\in\R^K$ $(h_\Theta(x))_i=i^{th}$ output

$J{(\Theta)}=−\frac{1}{m}[∑_{i=1}^m\sum^{K}_{k=1}y_k^{(i)}log(h_\Theta(x^{(i)}))_k+(1−y_k^{(i)}) log(1−h_\Theta(x^{(i)}))_k]+\frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{l+1}}{(\Theta_{ji}^{(l)})^2}$

Note:

the double sum simply adds up the logistic regression costs calculated for each cell in the output layer
the triple sum simply adds up the squares of all the individual Θs in the entire network.
the i in the triple sum does not refer to training example i

Gradient computation: Backpropagation algorithm

反向传播算法

$\delta_j^{(l)}$ = “error” of node $j$ in layer $l$

Backpropagation algorithm

Training set { $x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)})$ }

Set $\Delta_{ij}^{(l)}=0$ (for all l,i,j) （用来作累加项计算偏导数）

For $i = 1$ to m $x^{(i)},y^{(i)})$

Set $a^{(1)}=x^{(i)}$

Perform forward propagation to compute $a^{(l)}$ for $l = 2, 3, . . ., L$

Using $y^{(i)}$ , compute $\delta^{(L)}=a^{(L)}-y^{(i)}$

Compute $\delta^{(L-1)},\delta^{(L-2)},...,\delta^{(2)}$ using $\delta^{(l)} = ((\Theta^{(l)})^T \delta^{(l+1)})\ .*\ a^{(l)}\ .*\ (1 - a^{(l)})$

$\Delta_{ij}^{(l)}:=\Delta_{ij}^{(l)}+a_{j}^{(l)}\delta_i^{(l+1)}$ (with vectorization, $Δ^{(l)}:=Δ^{(l)}+δ^{(l+1)}(a^{(l)})^T$ )

Hence update new $\Delta$ matrix:

$D^{(l)}_{i,j} := \dfrac{1}{m}\left(\Delta^{(l)}_{i,j} + \lambda\Theta^{(l)}_{i,j}\right), if \;j≠0$ .
$D^{(l)}_{i,j} := \dfrac{1}{m}\Delta^{(l)}_{i,j} If \;j=0$

get $\frac \partial {\partial \Theta_{ij}^{(l)}} J(\Theta)= D_{ij}^{(l)}$

Backpropagation Intuition

固定步骤：

forward:

第一层 $x^{(i)},y^{(i)})$ , 传播到第一个隐藏层时，计算出输入单元的加权和 $z^{(2)}$

，通过sigmoid逻辑函数和sigmoid激活函数算得激活值 $a^{(2)}$ ，继续同样的方法进行前向传播

第一个隐藏层输入到第二个隐藏层的权重为 $\Theta^{(2)}$ ,，可以得到关系式：

第三层 $z^{(3)}=\Theta^{(2)}_{10}\times1+\Theta^{(2)}_{11}\times a^{(2)}_1+...$

反向传播则相反

代价函数表达式中：

$J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(t)}_klog(h_θ(x^{(t)}))_k+(1−y^{(t)}_k) log(1−h_θ(x^{(t)})_k)]+\frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{sl}\sum^{sl+1}_{j=1}{\theta^2_{j,i}}$

$cost(t)=y(t) log(h_Θ(x^{(t)}))+(1−y(t)) log(1−h_Θ(x^{(t)}))$ 可以看作一种方差函数 ( $cost(t)\approx(h_Θ(x^{(t)})-y^{(t)})^2$ )

可以直观地看到：

$\delta^{(l)}_j=\frac{\partial}{\partial{z^{(l)}_j}}cost(t)$ (for $j\geq0$ )

$\delta^{(l)}_j$ 就是第l层第i个单元得到的激活项误差error

计算：从最后一层，假设为 $\delta_1^{(4)}=a^{(4)}-y$ （预测值减去实际值）

继续向前传播算得 $\delta^{(3)}$ 等，计算过程：

$\delta^{(3)}$ = $\delta^{(4)}\times\Theta_3$

Notice:All the $\delta$ values only for the hidden units but excluding the biased units.

NZOGGY_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Cost Function and Backpropagation

Cost Function and BackpropagationCost FunctionNeural Network(Classification)有m组训练集{(x(1),y(1)),(x(2),y(2)),...,(x(m),y(m))(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)})(x(1),y(1)),(x(2),y(2)),...,(x(m),y(m))}LLL= total number of layers in ne
复制链接

扫一扫