经过前面对CNN的直观上的理解和公式的推导,我们也可以完成手写CNN的工作了,写完成就感MAX。
reference: CNN反向传播推导, deeplearning.ai CNN,CS231n note
1.conv_forward_naive
首先根据题目中给出的信息,确定卷积核尺寸(数量、通道、高、宽)
N, C, H, W = x.shape
F, C, HH, WW = w.shape
(1)算出输出尺寸(直接按公式来写)
H_out = int(1 + (H + 2 * pad - HH) / stride)
W_out = int(1 + (W + 2 * pad - WW) / stride)
out = np.zeros((N, F, H_out, W_out))
(2)zero padding
"""
np.pad(array, pad_width, mode)
@-array: 要填充的数组
@-pad_width: 表示每个轴(axis)边缘需要填充的数值数目。
参数输入方式为:((before_1, after_1),...(before_N, after_N)),其中(before_1,after_1)表示第一轴两边缘分别填充before_1个和after_1个数值。
@-mode: 表示填充的方式
填充方式:
'constant'--- 表示连续填充相同的值,每个轴可以分别指定填充值,constant_values=(x, y)时前面用x填充,后面用y填充,缺省值充0
"""
x_pad = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant', constant_values=0)
(3)卷积操作
在每个通道上,按照stride移动卷积核与对应区域进行点乘,然后求和
(x, w, b, conv_param) = cache
stride, pad = conv_param['stride'],conv_param['pad']
N, C, H, W = x.shape
F, C, HH, WW = w.shape
H_out = int(1 + (H + 2 * pad - HH) / stride)
W_out = int(1 + (W + 2 * pad - WW) / stride)
out = np.zeros((N, F, H_out, W_out))
获得对应区域:
x_pad_mask = x_pad[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW] #(:, :, HH. WW)
注意我们求和是在(C, H, W)上进行,所以axis=(1, 2, 3)
for k in range(F):
#卷积
out[:, k, i, j] = np.sum(x_pad_mask * w[k, :, :, :], axis=(1,2,3))
最后加上偏置项:
out += b[None, :, None, None]#None相当于 numpy.newaxis,将bias改为与输出同维度数,然后broadcast
2.conv_backward_naive
首先还是得到各种尺寸信息:
(x, w, b, conv_param) = cache
stride, pad = conv_param['stride'],conv_param['pad']
N, C, H, W = x.shape
F, C, HH, WW = w.shape
H_out = int(1 + (H + 2 * pad - HH) / stride)
W_out = int(1 + (W + 2 * pad - WW) / stride)
out = np.zeros((N, F, H_out, W_out))
db
db最好求,还是对dout求和进行,注意尺寸一致即可:
#db 注意db与b的形状一致,求和就好
#b为(F,)
db = np.sum(dout, axis=(0,2,3))
dx
在上一篇中我们已经推导出dx与dw的计算公式
dx为dout在padding后与旋转180°的卷积核进行卷积,注意尺寸
x_padded_mask = x_pad[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW] #(:, :, HH, WW)
for n in range(N):
dx_pad[n, :, i*stride:i*stride+HH, j*stride:j*stride+WW] += np.sum((dout[n, : , i, j])[:, None, None, None] * w, axis=0)
最后将dx周围的pad去掉
dx = dx_pad[:,:,pad:-pad,pad:-pad]
dw
由推导得,dw为dout与x对应区域进行卷积
for k in range(F):
dw[k, :, :, :] += np.sum((dout[:, k , i, j])[:, None, None, None] * x_padded_mask, axis=0)
3.max_pool_forward_naive
取得尺寸数据与确定输出尺寸大小:
pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
N, C, H, W = x.shape
#shape of out
H_out = int(1 + (H - pool_height) / stride)
W_out = int(1 + (W - pool_width) / stride)
out = np.zeros((N, C, H_out, W_out))
选择对应pooling区域中max值
for i in range(H_out):
for j in range(W_out):
#stride
x_padded_mask = x[:, :, i*stride:i*stride+pool_height, j*stride:j*stride+pool_width] #(:, :, HH, WW)
#find max
out[:, :, i, j] = np.max(x_padded_mask, axis=(2,3))
4.max_pool_backward_naive
依旧是获取并计算尺寸相关;获取最大值位置,该位置有最大值为1,否则为0,接下来得反向传播与RELU类似。
(x, pool_param) = cache
pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
N, C, H, W = x.shape
#shape of out
H_out = int(1 + (H - pool_height) / stride)
W_out = int(1 + (W - pool_width) / stride)
dx = np.zeros((N, C, H, W))
for i in range(H_out):
for j in range(W_out):
#stride
x_padded_mask = x[:, :, i*stride:i*stride+pool_height, j*stride:j*stride+pool_width] #(:, :, HH, WW)
# findmax
max_mask = np.max(x_padded_mask, axis=(2,3))
#获得最大值所在位置
temp_binary_mask = (x_padded_mask == (max_mask)[:,:,None,None])
dx[:, :, i*stride:i*stride+pool_height, j*stride:j*stride+pool_width] += temp_binary_mask * (dout[:,:,i,j])[:,:,None,None]
5.spatial batch normalization
直接看作业里给的引言吧,已经比较清楚地告诉我们已经如何去计算了。
其实不过是把原来的BN为(N, D),这里是将(N, C, H, W)转换为(N*H*W, C),这里我们用到了np.transpose().
N, C, H, W = x.shape
# (N, C, H, W)->(N*H*W, C) transpose表示将原有的轴放在哪个位置
#对每个特征通道进行计算
a, cache = batchnorm_forward(x.transpose(0,2,3,1).reshape((N*H*W,C)), gamma, beta, bn_param)
# (N*H*W,, C)->(N, C, H, W)
out = a.reshape(N, H, W, C).transpose(0,3,1,2)
反向也是同样计算
N, C, H, W = dout.shape
# (N, C, H, W)->(N*H*W, C)
dx_bn, dgamma, dbeta = batchnorm_backward(dout.transpose(0,2,3,1).reshape((N*H*W,C)), cache)
# (N*H*W, C) ->(N, C, H, W)
dx = dx_bn.reshape(N, H, W, C).transpose(0,3,1,2)
6.结果
首先是没有调参,随手填了一个学习率就获得了54%的acc,实力强大233.
休息一下咯,have fun,接下来有空看看tensorflow怎么用再继续做作业吧。