CS231n Assignment2 Q4心得笔记

Conv Forward

第一个小问题是让我们手工实现卷积操作,残暴的四层循环扫描计算,代码如下:


def conv_forward_naive(x, w, b, conv_param):
    """
    A naive implementation of the forward pass for a convolutional layer.

    The input consists of N data points, each with C channels, height H and
    width W. We convolve each input with F different filters, where each filter
    spans all C channels and has height HH and width WW.

    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
      - 'stride': The number of pixels between adjacent receptive fields in the
        horizontal and vertical directions.
      - 'pad': The number of pixels that will be used to zero-pad the input. 
        

    During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
    along the height and width axes of the input. Be careful not to modfiy the original
    input x directly.

    Returns a tuple of:
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
      H' = 1 + (H + 2 * pad - HH) / stride
      W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    ###########################################################################
    # TODO: Implement the convolutional forward pass.                         #
    # Hint: you can use the function np.pad for padding.                      #
    ###########################################################################
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    pad = conv_param['pad']
    stride = conv_param['stride']
    H_conv = 1 + (H + 2 * pad - HH) // stride
    W_conv = 1 + (W + 2 * pad - WW) // stride
    out = np.zeros((N, F, H_conv, W_conv))
    x_pad = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode = 'constant', constant_values = 0)
    for i in range(N):
        for f in range(F):
            for j in range(H_conv):
                for k in range(W_conv):
                    out[i, f, j, k] = np.sum(x_pad[i, :, j*stride : j*stride + HH, k*stride : k*stride + WW] * w[f]) + b[f]
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = (x, w, b, conv_param)
    return out, cache

注意一下np.pad的用法,传入参数的时候对所有维度的填补情况都要传。

Conv Backward

接下来是实现卷积的反向传播,由于我们每次计算输出的一个点,所以进行反向传播的时候我们也是依次将每个点传播回去,代码如下:

def conv_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a convolutional layer.

    Inputs:
    - dout: Upstream derivatives.
    - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

    Returns a tuple of:
    - dx: Gradient with respect to x
    - dw: Gradient with respect to w
    - db: Gradient with respect to b
    """
    dx, dw, db = None, None, None
    ###########################################################################
    # TODO: Implement the convolutional backward pass.                        #
    ###########################################################################
    x, w, b, conv_param = cache
    pad, stride = conv_param['pad'], conv_param['stride']
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    H_conv = 1 + (H + 2 * pad - HH) // stride
    W_conv = 1 + (W + 2 * pad - WW) // stride
    x_pad = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode = 'constant', constant_values = 0)
    dw = np.zeros_like(w)
    db = np.zeros_like(b)
    dx_pad = np.zeros_like(x_pad)
    for i in range(N):
        for f in range(F):
            for j in range(H_conv):
                for k in range(W_conv):
                    db[f] += dout[i, f, j, k]
                    dw[f] += dout[i, f, j, k] * x_pad[i, :, j*stride : j*stride + HH, k*stride : k*stride + WW]
                    dx_pad[i, :, j*stride : j*stride + HH, k*stride : k*stride + WW] += dout[i, f, j, k] * w[f]
    dx = dx_pad[:, :, pad:-pad, pad:-pad]
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dw, db

注意反向传播的时候算出来的是dx_pad,我们要的是dx只要简单的将上下左右四边裁掉即可。

Maxpool Forward

下一个问题是实现池化层,和卷积一毛一样,一个子矩阵取最大就好了。直接上代码:

def max_pool_forward_naive(x, pool_param):
    """
    A naive implementation of the forward pass for a max-pooling layer.

    Inputs:
    - x: Input data, of shape (N, C, H, W)
    - pool_param: dictionary with the following keys:
      - 'pool_height': The height of each pooling region
      - 'pool_width': The width of each pooling region
      - 'stride': The distance between adjacent pooling regions

    No padding is necessary here. Output size is given by 

    Returns a tuple of:
    - out: Output data, of shape (N, C, H', W') where H' and W' are given by
      H' = 1 + (H - pool_height) / stride
      W' = 1 + (W - pool_width) / stride
    - cache: (x, pool_param)
    """
    out = None
    ###########################################################################
    # TODO: Implement the max-pooling forward pass                            #
    ###########################################################################
    N, C, H, W = x.shape
    pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
    H_pool = 1 + (H - pool_height) // stride
    W_pool = 1 + (W - pool_width) // stride
    out = np.zeros((N, C, H_pool, W_pool))
    for i in range(N):
        for c in range(C):
            for j in range(H_pool):
                for k in range(W_pool):
                    out[i, c, j, k] = np.max(x[i, c, j*stride : j*stride+pool_height, k*stride : k*stride+pool_width])
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = (x, pool_param)
    return out, cache

Maxpool Backward

下一个问题是池化层的反向传播,类似的,代码如下:

def max_pool_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a max-pooling layer.

    Inputs:
    - dout: Upstream derivatives
    - cache: A tuple of (x, pool_param) as in the forward pass.

    Returns:
    - dx: Gradient with respect to x
    """
    dx = None
    ###########################################################################
    # TODO: Implement the max-pooling backward pass                           #
    ###########################################################################
    x, pool_param = cache
    pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
    N, C, H, W = x.shape
    H_pool = 1 + (H - pool_height) // stride
    W_pool = 1 + (W - pool_width) // stride
    dx = np.zeros_like(x)
    for i in range(N):
        for c in range(C):
            for j in range(H_pool):
                for k in range(W_pool):
                    pos = np.argmax(x[i, c, j*stride : j*stride+pool_height, k*stride : k*stride+pool_width])
                    indices = np.unravel_index(pos, dims = (pool_height, pool_width))
                    dx[i, c, j*stride+indices[0], k*stride+indices[1]] += dout[i, c, j, k]
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx

这里有一个比较重要的点就是,由于我们对max函数反向传播的时候,只有最大的值导数为1,其余为0,所以我们需要定位这个最大的值,首先我们通过np.argmax找到在当前这个子矩阵中最大值的下标pos,但要注意这个pos是一个int类型的变量,我们需要将它解析为一个二维的坐标以指明这个值在子矩阵中的位置,于是我们的np.unravel_index函数就登场了!这个函数的作用就是将一个值定位到相应大小矩阵的坐标上,得到了这个二维坐标indices之后,我们就知道从当前这个点(j, k)出发,(j + indices[0], k + indices[1])就是我们需要设置导数为1的点。注意:当我们传入的这个pos是一个数组时,那么函数会对于其中的每个数返回其对应的坐标。

然后作业给我们展示了神奇的快速卷积和池化方式,我并没有看懂…但题目也没要求我们实现,总之就是很快很厉害,速度大幅超过我们实现的naive版本。

Three-layer ConvNet

下一个问题是让我们用之前实现的函数搭建一个三层的卷积神经网络,和之前搭建全连接网络比较类似,代码如下:

class ThreeLayerConvNet(object):
    """
    A three-layer convolutional network with the following architecture:

    conv - relu - 2x2 max pool - affine - relu - affine - softmax

    The network operates on minibatches of data that have shape (N, C, H, W)
    consisting of N images, each with height H and width W and with C input
    channels.
    """

    def __init__(self, input_dim=(3, 32, 32), num_filters=32, filter_size=7,
                 hidden_dim=100, num_classes=10, weight_scale=1e-3, reg=0.0,
                 dtype=np.float32):
        """
        Initialize a new network.

        Inputs:
        - input_dim: Tuple (C, H, W) giving size of input data
        - num_filters: Number of filters to use in the convolutional layer
        - filter_size: Width/height of filters to use in the convolutional layer
        - hidden_dim: Number of units to use in the fully-connected hidden layer
        - num_classes: Number of scores to produce from the final affine layer.
        - weight_scale: Scalar giving standard deviation for random initialization
          of weights.
        - reg: Scalar giving L2 regularization strength
        - dtype: numpy datatype to use for computation.
        """
        self.params = {}
        self.reg = reg
        self.dtype = dtype

        ############################################################################
        # TODO: Initialize weights and biases for the three-layer convolutional    #
        # network. Weights should be initialized from a Gaussian centered at 0.0   #
        # with standard deviation equal to weight_scale; biases should be          #
        # initialized to zero. All weights and biases should be stored in the      #
        #  dictionary self.params. Store weights and biases for the convolutional  #
        # layer using the keys 'W1' and 'b1'; use keys 'W2' and 'b2' for the       #
        # weights and biases of the hidden affine layer, and keys 'W3' and 'b3'    #
        # for the weights and biases of the output affine layer.                   #
        #                                                                          #
        # IMPORTANT: For this assignment, you can assume that the padding          #
        # and stride of the first convolutional layer are chosen so that           #
        # **the width and height of the input are preserved**. Take a look at      #
        # the start of the loss() function to see how that happens.                #                           
        ############################################################################
        C, H, W = input_dim
        self.params['W1'] = weight_scale * np.random.randn(num_filters, C, filter_size, filter_size)
        self.params['b1'] = np.zeros(num_filters)
        N = int(num_filters * (H / 2) * (W / 2))
        self.params['W2'] = weight_scale * np.random.randn(N, hidden_dim)
        self.params['b2'] = np.zeros(hidden_dim)
        self.params['W3'] = weight_scale * np.random.randn(hidden_dim, num_classes)
        self.params['b3'] = np.zeros(num_classes)
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        for k, v in self.params.items():
            self.params[k] = v.astype(dtype)


    def loss(self, X, y=None):
        """
        Evaluate loss and gradient for the three-layer convolutional network.

        Input / output: Same API as TwoLayerNet in fc_net.py.
        """
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        W3, b3 = self.params['W3'], self.params['b3']

        # pass conv_param to the forward pass for the convolutional layer
        # Padding and stride chosen to preserve the input spatial size
        filter_size = W1.shape[2]
        conv_param = {'stride': 1, 'pad': (filter_size - 1) // 2}

        # pass pool_param to the forward pass for the max-pooling layer
        pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

        scores = None
        ############################################################################
        # TODO: Implement the forward pass for the three-layer convolutional net,  #
        # computing the class scores for X and storing them in the scores          #
        # variable.                                                                #
        #                                                                          #
        # Remember you can use the functions defined in cs231n/fast_layers.py and  #
        # cs231n/layer_utils.py in your implementation (already imported).         #
        ############################################################################
        out, cache1 = conv_relu_pool_forward(X, W1, b1, conv_param, pool_param)
        out, cache2 = affine_relu_forward(out, W2, b2)
        out, cache3 = affine_forward(out, W3, b3)
        scores = out
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        if y is None:
            return scores

        loss, grads = 0, {}
        ############################################################################
        # TODO: Implement the backward pass for the three-layer convolutional net, #
        # storing the loss and gradients in the loss and grads variables. Compute  #
        # data loss using softmax, and make sure that grads[k] holds the gradients #
        # for self.params[k]. Don't forget to add L2 regularization!               #
        #                                                                          #
        # NOTE: To ensure that your implementation matches ours and you pass the   #
        # automated tests, make sure that your L2 regularization includes a factor #
        # of 0.5 to simplify the expression for the gradient.                      #
        ############################################################################
        loss, dout = softmax_loss(scores, y)
        reg_loss = 0.5 * self.reg * np.sum(W1 * W1) + 0.5 * self.reg * np.sum(W2 * W2) + 0.5 * self.reg * np.sum(W3 * W3)
        loss += reg_loss
        dx, grads['W3'], grads['b3'] = affine_backward(dout, cache3)
        dx, grads['W2'], grads['b2'] = affine_relu_backward(dx, cache2)
        dx, grads['W1'], grads['b1'] = conv_relu_pool_backward(dx, cache1)
        grads['W1'] += self.reg * W1
        grads['W2'] += self.reg * W2
        grads['W3'] += self.reg * W3
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        return loss, grads

Spatial Batch Normalization

下一个问题是实现Spatial Batch Normalization,所谓Spatial,作业也给出了解释,最初我们实现Batch Normalization的是对大小为(N, D)的输入进行处理,对于一个批次N个样本计算每一个特征维度上的均值和方差;而Spatial则要求我们将C作为特征层,对N, H, W所有维度计算每一个特征维度上的均值和方差,因此我们做一些矩阵的转置操作然后调用BN函数即可,相应的做反向传播的时候我们需要再将矩阵转置回来。代码如下:

def spatial_batchnorm_forward(x, gamma, beta, bn_param):
    """
    Computes the forward pass for spatial batch normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (C,)
    - beta: Shift parameter, of shape (C,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance. momentum=0 means that
        old information is discarded completely at every time step, while
        momentum=1 means that new information is never incorporated. The
        default of momentum=0.9 should work well in most situations.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None

    ###########################################################################
    # TODO: Implement the forward pass for spatial batch normalization.       #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    N, C, H, W = x.shape
    x_reshape = x.transpose(0, 2, 3, 1).reshape(-1, C)
    out, cache = batchnorm_forward(x_reshape, gamma, beta, bn_param)
    out = out.reshape(N, H, W, C).transpose(0, 3, 1, 2)
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return out, cache


def spatial_batchnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial batch normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (C,)
    - dbeta: Gradient with respect to shift parameter, of shape (C,)
    """
    dx, dgamma, dbeta = None, None, None

    ###########################################################################
    # TODO: Implement the backward pass for spatial batch normalization.      #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    N, C, H, W = dout.shape
    dout = dout.transpose(0, 2, 3, 1).reshape(-1, C)
    dx, dgamma, dbeta = batchnorm_backward(dout, cache)
    dx = dx.reshape(N, H, W, C).transpose(0, 3, 1, 2)
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return dx, dgamma, dbeta

Group Normalization

最后一个小问题是实现Group Normalization,首先要知道Group是基于Layer Normalization的改进,所谓Group就是说将C这个维度的特征分成Group组,相应的参数也要分成Group组,然后去做Group次Layer Normalization,最后的结果再合到一起,因此我们得到代码如下:

def spatial_groupnorm_forward(x, gamma, beta, G, gn_param):
    """
    Computes the forward pass for spatial group normalization.
    In contrast to layer normalization, group normalization splits each entry 
    in the data into G contiguous pieces, which it then normalizes independently.
    Per feature shifting and scaling are then applied to the data, in a manner identical to that of batch normalization and layer normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (C,)
    - beta: Shift parameter, of shape (C,)
    - G: Integer mumber of groups to split into, should be a divisor of C
    - gn_param: Dictionary with the following keys:
      - eps: Constant for numeric stability

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None
    eps = gn_param.get('eps',1e-5)
    ###########################################################################
    # TODO: Implement the forward pass for spatial group normalization.       #
    # This will be extremely similar to the layer norm implementation.        #
    # In particular, think about how you could transform the matrix so that   #
    # the bulk of the code is similar to both train-time batch normalization  #
    # and layer normalization!                                                # 
    ###########################################################################
    N, C, H, W = x.shape
    x_reshape = x.transpose(0, 2, 3, 1).reshape(-1, C)
    batch = int(C / G)
    cache = []
    out = np.zeros_like(x_reshape)
    for i in range(G):
        x_sample = x_reshape[:, i * batch : (i + 1) * batch]
        gamma_sample = gamma.reshape(C)[i * batch : (i + 1) * batch]
        beta_sample = beta.reshape(C)[i * batch : (i + 1) * batch]
        res, cache_sample = layernorm_forward(x_sample, gamma_sample, beta_sample, gn_param)
        out[:, i * batch : (i + 1) * batch] = res
        cache.append(cache_sample)
    out = out.reshape(N, H, W, C).transpose(0, 3, 1, 2)
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return out, cache


def spatial_groupnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial group normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (C,)
    - dbeta: Gradient with respect to shift parameter, of shape (C,)
    """
    dx, dgamma, dbeta = None, None, None

    ###########################################################################
    # TODO: Implement the backward pass for spatial group normalization.      #
    # This will be extremely similar to the layer norm implementation.        #
    ###########################################################################
    N, C, H, W = dout.shape
    dout = dout.transpose(0, 2, 3, 1).reshape(-1, C)
    dx = np.zeros_like(dout)
    dgamma, dbeta = np.zeros(C), np.zeros(C)
    G = len(cache)
    batch = int(C / G)
    for i in range(G - 1, -1, -1):
        cache_sample = cache[i]
        dout_sample = dout[:, i * batch : (i + 1) * batch]
        dx_sample, dgamma_sample, dbeta_sample = layernorm_backward(dout_sample, cache_sample)
        dx[:, i * batch : (i + 1) * batch] = dx_sample
        dgamma[i * batch : (i + 1) * batch] = dgamma_sample
        dbeta[i * batch : (i + 1) * batch] = dbeta_sample
    dx = dx.reshape(N, H, W, C).transpose(0, 3, 1, 2)
    dgamma = dgamma.reshape(1, C, 1, 1)
    dbeta = dbeta.reshape(1, C, 1, 1)
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dgamma, dbeta

这里有一个小坑的地方就是作业里传进来的gamma, beta参数都是(1, C, 1, 1)大小的,我一开始天真的以为就是(C, )的,所以我们将参数划分成组的时候需要先reshape一下,同样反向传播之后要reshape回来。

至此,第四个作业也终于结束啦,实现了卷积神经网络里面主要层的前向计算和反向传播,对于本质原理又有了更深的认识,收获满满~

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值