深度学习02-反向传播（backward propagation）

最新推荐文章于 2024-03-06 17:02:38 发布

itspollyyy

最新推荐文章于 2024-03-06 17:02:38 发布

阅读量296

点赞数

分类专栏：深度学习文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/weixin_43399179/article/details/134177641

版权

深度学习专栏收录该内容

9 篇文章 0 订阅

订阅专栏

目的：

计算神经网络的梯度

方式：

链式法则（chain rule）（按照相反的顺序，从输出层遍历网络，依次计算每个中间变量和参数的梯度）

网络层：

在4层的网络中：

输入层

$x_i = a_i^{(1)}, i \in 1,2,3,4$

第二层

$z^{(2)} = W^{(1)}x+b^{(1)}; a^{(2)} = f(z^{(2)})$ ，

其中W是权重，b是偏置（bias）， a是激活函数(tanh, ReLU, sigmoid等)

第三层

$z^{(3)} = W^{(2)}a^{(2)}+b^{(2)}; a^{(3)}=f(z^{(3)})$

⚠️：其中W，b都是矩阵（matrices），右上角的角标表示层数

譬如：第二层的 $W^{(1)}=\begin{bmatrix} W_{11}^{(1)} &W_{12}^{(1)} &W_{13}^{(1)} &W_{14}^{(1)} \\ W_{21}^{(1)} &W_{22}^{(1)} & W_{23}^{(1)} & W_{24}^{(1)} \end{bmatrix}$

输出层

$s = W^{(3)}a^{(3)}$ , s是预测输出。

若训练数据为 $(x,y)$ ， x是输入数据，y是标签（输出值）。将y与s通过代价函数（cost fucntion）比较 $C = cost(s,y)$ 。一般使用交叉熵（cross-entropy）

反向传播通过最小化代价函数从而调整权重和偏置。

$w := w-\epsilon \frac{\partial C}{\partial w}$

$b := b-\epsilon \frac{\partial C}{\partial b}$

$\epsilon$ is learning rate

链式法则计算： $\frac{\partial C}{\partial w_22^{(2)}} = \frac{\partial C}{\partial z_2^{(3)}}\cdot \frac{\partial z_2^{(3)}}{\partial w_22^{(2)}} =\frac{\partial C}{\partial a_2^{(3)}} \cdot \frac{\partial a_2^{(3)}}{\partial z_2^{(3)}}\cdot a_2^{(2)}$

在该神经网络中，计算 $w_{22}^{(2)}$ 的偏导

由此可知 $w_{22}^{(2)}$ 与 $a_2^{(2)}, z_2^{(2)}$ 相关联，通过链式法则

x, w, b, conv_param = cache
'''
 x is the input data
 w is weight
 b is bias
 conv_param: A dictionary with the following keys:
          - 'stride': The number of pixels between adjacent receptive fields
            in the horizontal and vertical directions.
          - 'pad': The number of pixels that is used to zero-pad the input.
'''
pad = conv_param['pad']
# padding
stride = conv_param['stride']
# stride
N, F, H_dout, W_dout = dout.shape
'''
N-numbers
F-fliter
H-dout: Height
W_dout: Width
'''
F, C, HH, WW = w.shape
# Filter weights of shape (F, C, HH, WW)

# init db, dw, dx
db = torch.zeros_like(b)
dw = torch.zeros_like(w)
dx = torch.zeros_like(x)

for n in range(N):
    for f in range(F):
        for height in range(H_dout):
            for width in range(W_dout):
                db[f] += dout[n, f, height, width]
                dw[f] += x[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] * dout[n, f, height, width]
                dx[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] += w[f] * dout[n, f, height, width]

dx = dx[:, :, 1:-1, 1:-1]  # delete padded "pixels"

附：

1.卷积核（fliter）的数量的作用

卷积核（fliter）的数量=神经元的数量，每个神经元对卷积的输入执行不同的卷积。

特征图的结果是应用了卷积核（mapping，stride）后的结果

2. 多通道卷积核

3. mapping&stride

Convolution layer, Padding, Stride, and Pooling in CNN

Reference：

1.Understanding Backpropagation Algorithm

2.反向传播与梯度下降详解

3. What is the number of filter in CNN?

4. Backpropagation in RNN Explained

itspollyyy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
深度学习02-反向传播（backward propagation）

链式法则（chain rule）（按照相反的顺序，从输出层遍历网络，依次计算每个中间变量和参数的梯度）其中W是权重，b是偏置（bias）， a是激活函数(tanh, ReLU, sigmoid等)， x是输入数据，y是标签（输出值）。将y与s通过代价函数（cost fucntion）比较。卷积核（fliter）的数量=神经元的数量，每个神经元对卷积的输入执行不同的卷积。⚠️：其中W，b都是矩阵（matrices），右上角的角标表示层数。特征图的结果是应用了卷积核（mapping，stride）后的结果。
复制链接

扫一扫