目的:
计算神经网络的梯度
方式:
链式法则(chain rule)(按照相反的顺序,从输出层遍历网络,依次计算每个中间变量和参数的梯度)
网络层:
在4层的网络中:
输入层
第二层
,
其中W是权重,b是偏置(bias), a是激活函数(tanh, ReLU, sigmoid等)
第三层
⚠️:其中W,b都是矩阵(matrices),右上角的角标表示层数
譬如:第二层的
输出层
, s是预测输出。
若训练数据为, x是输入数据,y是标签(输出值)。将y与s通过代价函数(cost fucntion)比较。一般使用交叉熵(cross-entropy)
反向传播通过最小化代价函数从而调整权重和偏置。
is learning rate
链式法则计算:
在该神经网络中,计算的偏导
由此可知与相关联,通过链式法则
x, w, b, conv_param = cache
'''
x is the input data
w is weight
b is bias
conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields
in the horizontal and vertical directions.
- 'pad': The number of pixels that is used to zero-pad the input.
'''
pad = conv_param['pad']
# padding
stride = conv_param['stride']
# stride
N, F, H_dout, W_dout = dout.shape
'''
N-numbers
F-fliter
H-dout: Height
W_dout: Width
'''
F, C, HH, WW = w.shape
# Filter weights of shape (F, C, HH, WW)
# init db, dw, dx
db = torch.zeros_like(b)
dw = torch.zeros_like(w)
dx = torch.zeros_like(x)
for n in range(N):
for f in range(F):
for height in range(H_dout):
for width in range(W_dout):
db[f] += dout[n, f, height, width]
dw[f] += x[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] * dout[n, f, height, width]
dx[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] += w[f] * dout[n, f, height, width]
dx = dx[:, :, 1:-1, 1:-1] # delete padded "pixels"
附:
1.卷积核(fliter)的数量的作用
卷积核(fliter)的数量=神经元的数量,每个神经元对卷积的输入执行不同的卷积。
特征图的结果是应用了卷积核(mapping,stride)后的结果
2. 多通道卷积核
3. mapping&stride
Convolution layer, Padding, Stride, and Pooling in CNN
Reference:
1.Understanding Backpropagation Algorithm