CS231n -- assignment2 Convolutional Network

经过前面对CNN的直观上的理解和公式的推导,我们也可以完成手写CNN的工作了,写完成就感MAX。

reference:   CNN反向传播推导deeplearning.ai CNNCS231n note

1.conv_forward_naive

首先根据题目中给出的信息,确定卷积核尺寸(数量、通道、高、宽)

N, C, H, W = x.shape
F, C, HH, WW = w.shape

(1)算出输出尺寸(直接按公式来写)

H_out = int(1 + (H + 2 * pad - HH) / stride)
W_out = int(1 + (W + 2 * pad - WW) / stride)
out = np.zeros((N, F, H_out, W_out))

(2)zero padding

    """
    np.pad(array, pad_width, mode)
    @-array: 要填充的数组
    @-pad_width: 表示每个轴(axis)边缘需要填充的数值数目。
    参数输入方式为:((before_1, after_1),...(before_N, after_N)),其中(before_1,after_1)表示第一轴两边缘分别填充before_1个和after_1个数值。
    @-mode: 表示填充的方式
      填充方式:
      'constant'--- 表示连续填充相同的值,每个轴可以分别指定填充值,constant_values=(x, y)时前面用x填充,后面用y填充,缺省值充0
    """
    x_pad = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant', constant_values=0)

(3)卷积操作

 在每个通道上,按照stride移动卷积核与对应区域进行点乘,然后求和

    (x, w, b, conv_param) = cache
    stride, pad = conv_param['stride'],conv_param['pad']
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    
    H_out = int(1 + (H + 2 * pad - HH) / stride)
    W_out = int(1 + (W + 2 * pad - WW) / stride)
    out = np.zeros((N, F, H_out, W_out))

获得对应区域:

x_pad_mask = x_pad[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW] #(:, :, HH. WW)

注意我们求和是在(C, H, W)上进行,所以axis=(1, 2, 3)

for k in range(F):
  #卷积
  out[:, k, i, j] = np.sum(x_pad_mask * w[k, :, :, :], axis=(1,2,3))

最后加上偏置项:

out += b[None, :, None, None]#None相当于 numpy.newaxis,将bias改为与输出同维度数,然后broadcast

2.conv_backward_naive

首先还是得到各种尺寸信息:

(x, w, b, conv_param) = cache
stride, pad = conv_param['stride'],conv_param['pad']
N, C, H, W = x.shape
F, C, HH, WW = w.shape
    
H_out = int(1 + (H + 2 * pad - HH) / stride)
W_out = int(1 + (W + 2 * pad - WW) / stride)
out = np.zeros((N, F, H_out, W_out))

db

db最好求,还是对dout求和进行,注意尺寸一致即可:

#db 注意db与b的形状一致,求和就好
#b为(F,)
db = np.sum(dout, axis=(0,2,3))

dx

在上一篇中我们已经推导出dx与dw的计算公式

dx为dout在padding后与旋转180°的卷积核进行卷积,注意尺寸

x_padded_mask = x_pad[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW] #(:, :, HH, WW)
for n in range(N):
                dx_pad[n, :, i*stride:i*stride+HH, j*stride:j*stride+WW] += np.sum((dout[n, : , i, j])[:, None, None, None] * w, axis=0)

最后将dx周围的pad去掉

dx = dx_pad[:,:,pad:-pad,pad:-pad]

dw

由推导得,dw为dout与x对应区域进行卷积

for k in range(F):
    dw[k, :, :, :] += np.sum((dout[:, k , i, j])[:, None, None, None] * x_padded_mask, axis=0)

3.max_pool_forward_naive

取得尺寸数据与确定输出尺寸大小:

  pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
  N, C, H, W = x.shape
  #shape of out
  H_out = int(1 + (H - pool_height) / stride)
  W_out = int(1 + (W - pool_width) / stride)
  out = np.zeros((N, C, H_out, W_out))

选择对应pooling区域中max值

    for i in range(H_out):
        for j in range(W_out):
	    #stride
            x_padded_mask = x[:, :, i*stride:i*stride+pool_height, j*stride:j*stride+pool_width] #(:, :, HH, WW)
	    #find max
            out[:, :, i, j] = np.max(x_padded_mask, axis=(2,3))

4.max_pool_backward_naive

依旧是获取并计算尺寸相关;获取最大值位置,该位置有最大值为1,否则为0,接下来得反向传播与RELU类似。

    (x, pool_param) = cache
    pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
    N, C, H, W = x.shape
	#shape of out
    H_out = int(1 + (H - pool_height) / stride)
    W_out = int(1 + (W - pool_width) / stride)
    dx = np.zeros((N, C, H, W))
	
    for i in range(H_out):
        for j in range(W_out):
			#stride
            x_padded_mask = x[:, :, i*stride:i*stride+pool_height, j*stride:j*stride+pool_width] #(:, :, HH, WW)
			# findmax
            max_mask = np.max(x_padded_mask, axis=(2,3))
			#获得最大值所在位置
            temp_binary_mask = (x_padded_mask == (max_mask)[:,:,None,None])
            dx[:, :, i*stride:i*stride+pool_height, j*stride:j*stride+pool_width] += temp_binary_mask * (dout[:,:,i,j])[:,:,None,None]

5.spatial batch normalization

直接看作业里给的引言吧,已经比较清楚地告诉我们已经如何去计算了。

其实不过是把原来的BN为(N, D),这里是将(N, C, H, W)转换为(N*H*W, C),这里我们用到了np.transpose().

    N, C, H, W = x.shape
	# (N, C, H, W)->(N*H*W, C) transpose表示将原有的轴放在哪个位置
	#对每个特征通道进行计算
    a, cache = batchnorm_forward(x.transpose(0,2,3,1).reshape((N*H*W,C)), gamma, beta, bn_param)
	# (N*H*W,, C)->(N, C, H, W)
    out = a.reshape(N, H, W, C).transpose(0,3,1,2)

反向也是同样计算

    N, C, H, W = dout.shape
	# (N, C, H, W)->(N*H*W, C)
    dx_bn, dgamma, dbeta = batchnorm_backward(dout.transpose(0,2,3,1).reshape((N*H*W,C)), cache)
	# (N*H*W, C) ->(N, C, H, W)
    dx = dx_bn.reshape(N, H, W, C).transpose(0,3,1,2) 

6.结果

首先是没有调参,随手填了一个学习率就获得了54%的acc,实力强大233.

休息一下咯,have fun,接下来有空看看tensorflow怎么用再继续做作业吧。



  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值