NNDL 作业7：第五章课后题（1×1 卷积核 | CNN BP）

最新推荐文章于 2024-07-23 22:38:20 发布

凉堇

最新推荐文章于 2024-07-23 22:38:20 发布

阅读量154

点赞数 1

文章标签： cnn 深度学习神经网络

本文链接：https://blog.csdn.net/m0_57250370/article/details/127610449

版权

文章目录

第五章课后习题
附加题
- 附加1：CNN反向传播推导。
- 附加2：设置简易CNN模型，分别用Numpy和Pytorch实现卷积层和池化层的反向传播算子，并带入数值测试。
参考链接

第五章课后习题

习题5-2 证明宽卷积具有交换性。

在这里插入图片描述

习题5-3 分析卷积神经网络中用1×1的卷积核的作用。

1.降维/升维

1x1卷积核可以通过控制卷积核数量实现降维或升维。
在这里插入图片描述

从图中可以清楚的看到卷积后的特征图通道数与卷积核的个数是相同的。所以，如果想要升维或降维，只需要通过修改卷积核的个数即可。

举例：如果input的通道个数是3，卷积核个数为4，那么特征图的通道数就为4，达到了升维效果。如果input的通道个数是3，卷积核个数为1，那么特征图的通道数就为1，达到了降维效果，即改变了 height × width × channels 中的 channels 这一个维度的大小。

2、增加网络深度（增加非线性）
每使用 1x1卷积核，及增加一层卷积层，所以网络深度得以增加。而使用 1x1卷积核后，可以保持特征图大小与输入尺寸相同，卷积层卷积过程会包含一个激活函数，从而增加了非线性。

在输入尺寸不发生改变的情况下而增加了非线性，所以会增加整个网络的表达能力。

3、跨通道信息交互（通道的变换）
使用1x1卷积核，实现降维和升维的操作其实就是 channel 间信息的线性组合变化。

比如：在尺寸 3x3，64通道个数的卷积核后面添加一个尺寸1x1，28通道个数的卷积核，就变成了尺寸3x3，28尺寸的卷积核。原来的64个通道就可以理解为跨通道线性组合变成了28通道，这就是通道间的信息交互。

习题5-4 对于一个输入为100×100×256的特征映射组，使用3×3的卷积核，输出为100×100×256的特征映射组的卷积层，求其时间和空间复杂度。如果引入一个1×1的卷积核，先得到100×100×64的特征映射，再进行3×3的卷积，得到100×100×256的特征映射组，求其时间和空间复杂度。

时间复杂度：时间复杂度即模型的运行次数。

单个卷积层的时间复杂度：Time~O(M^2 * K^2 * Cin * Cout)

M:输出特征图（Feature Map）的尺寸。
K:卷积核（Kernel）的尺寸。
Cin:输入通道数。
Cout:输出通道数。
注1：为了简化表达式变量个数，统一假设输入和卷积核的形状是正方形，实际中如果不是，则将M ^2替换成特征图的长宽相乘即可；

注2：每一层卷积都包含一个偏置参数（bias），这里也给忽略了。加上的话时间复杂度则为：O(M^2 * K^2 * Cin * Cout+Cout)。

空间复杂度：空间复杂度即模型的参数数量。1125664+10010064+3364256+100100256

单个卷积的空间复杂度：Space~O(K^2 * Cin * Cout+M^2*Cout)

空间复杂度只与卷积核的尺寸K、通道数C相关。而与输入图片尺寸无关。当我们需要裁剪模型时，由于卷积核的尺寸通常已经很小，而网络的深度又与模型的能力紧密相关，不宜过多削减，因此模型裁剪通常最先下手的地方就是通道数。

第一小问：

时间复杂度=10010033**256256=5898240000

空间复杂度=33**256256+100100256=3149824

第二小问：

时间复杂度=1001001125664+1001003364256=1638400000

空间复杂度=1125664+10010064+3364256+100100256=3363840

习题5-7 忽略激活函数，分析卷积网络中卷积层的前向计算和反向传播是一种转置关系。

在这里插入图片描述

附加题

附加1：CNN反向传播推导。

已知池化层的误差，推导上一隐藏层的误差　
在前向传播算法时，池化层一般我们会用MAX或者Average对输入进行池化，池化的区域大小已知。现在我们反过来，要从缩小后的误差δ^l，还原前一次较大区域对应的误差。
　　　　在反向传播时，我们首先会把δ^l的所有子矩阵矩阵大小还原成池化之前的大小，然后如果是MAX，则把δl的所有子矩阵的各个池化局域的值放在之前做前向传播算法得到最大值的位置。
　　　　如果是Average，则把δ^l的所有子矩阵的各个池化局域的值取平均后放在还原后的子矩阵位置。这个过程一般叫做upsample。
假设我们的池化区域大小是2x2。δl的第k个子矩阵为:
在这里插入图片描述

　如果是MAX，假设我们之前在前向传播时记录的最大值位置分别是左上，右下，右上，左下，则转换后的矩阵为：
　
　　如果是Average，则进行平均：转换后的矩阵为：
　　　　
　其中，upsample函数完成了池化误差矩阵放大与误差重新分配的逻辑。
　　我们概括下，对于张量δ^(l−1)，我们有：　
　　在这里插入图片描述
　已知卷积层的误差，推导上一隐藏层的误差　
　卷积层的前向传播公式：

　其中n_in为上一隐藏层的输入子矩阵个数。由此可以推导出：
　
　假设我们l−1层的输出a^(l−1)是一个3x3矩阵,则有：
a,W,z的矩阵表达式如下：
在这里插入图片描述
利用卷积的定义，很容易得出：
z11=a11w11+a12w12+a21w21+a22w22
z12=a12w11+a13w12+a22w21+a23w22
z21=a21w11+a22w12+a31w21+a32w22
z22=a22w11+a23w12+a32w21+a33w22
接着我们模拟反向求导：

由上式，可得：

已知卷积层的误差，推导该层的W,b的梯度
卷积层z和W,b的关系为：
在这里插入图片描述
那么对于第l层，某个个卷积核矩阵W的导数可以表示如下：

假设我们输入a是4x4的矩阵，卷积核W是3x3的矩阵，输出z是2x2的矩阵,那么反向传播的z的梯度误差δ也是2x2的矩阵。
那么根据上面的式子，我们有：
在这里插入图片描述
最终我们可以一共得到9个式子。整理成矩阵形式后可得：

从而可以清楚的看到这次我们为什么没有反转的原因。

附加2：设置简易CNN模型，分别用Numpy和Pytorch实现卷积层和池化层的反向传播算子，并带入数值测试。

卷积反向传播实现：

from typing import Dict, Tuple

import numpy as np
import pytest
import torch


def conv2d_forward(input: np.ndarray, weight: np.ndarray, bias: np.ndarray,
                   stride: int, padding: int) -> Dict[str, np.ndarray]:
    """2D Convolution Forward Implemented with NumPy

    Args:
        input (np.ndarray): The input NumPy array of shape (H, W, C).
        weight (np.ndarray): The weight NumPy array of shape
            (C', F, F, C).
        bias (np.ndarray | None): The bias NumPy array of shape (C').
            Default: None.
        stride (int): Stride for convolution.
        padding (int): The count of zeros to pad on both sides.

    Outputs:
        Dict[str, np.ndarray]: Cached data for backward prop.
    """
    h_i, w_i, c_i = input.shape
    c_o, f, f_2, c_k = weight.shape

    assert (f == f_2)
    assert (c_i == c_k)
    assert (bias.shape[0] == c_o)

    input_pad = np.pad(input, [(padding, padding), (padding, padding), (0, 0)])

    def cal_new_sidelngth(sl, s, f, p):
        return (sl + 2 * p - f) // s + 1

    h_o = cal_new_sidelngth(h_i, stride, f, padding)
    w_o = cal_new_sidelngth(w_i, stride, f, padding)

    output = np.empty((h_o, w_o, c_o), dtype=input.dtype)

    for i_h in range(h_o):
        for i_w in range(w_o):
            for i_c in range(c_o):
                h_lower = i_h * stride
                h_upper = i_h * stride + f
                w_lower = i_w * stride
                w_upper = i_w * stride + f
                input_slice = input_pad[h_lower:h_upper, w_lower:w_upper, :]
                kernel_slice = weight[i_c]
                output[i_h, i_w, i_c] = np.sum(input_slice * kernel_slice)
                output[i_h, i_w, i_c] += bias[i_c]

    cache = dict()
    cache['Z'] = output
    cache['W'] = weight
    cache['b'] = bias
    cache['A_prev'] = input
    return cache


def conv2d_backward(dZ: np.ndarray, cache: Dict[str, np.ndarray], stride: int,
                    padding: int) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """2D Convolution Backward Implemented with NumPy

    Args:
        dZ: (np.ndarray): The derivative of the output of conv.
        cache (Dict[str, np.ndarray]): Record output 'Z', weight 'W', bias 'b'
            and input 'A_prev' of forward function.
        stride (int): Stride for convolution.
        padding (int): The count of zeros to pad on both sides.

    Outputs:
        Tuple[np.ndarray, np.ndarray, np.ndarray]: The derivative of W, b,
            A_prev.
    """
    W = cache['W']
    b = cache['b']
    A_prev = cache['A_prev']
    dW = np.zeros(W.shape)
    db = np.zeros(b.shape)
    dA_prev = np.zeros(A_prev.shape)

    _, _, c_i = A_prev.shape
    c_o, f, f_2, c_k = W.shape
    h_o, w_o, c_o_2 = dZ.shape

    assert (f == f_2)
    assert (c_i == c_k)
    assert (c_o == c_o_2)

    A_prev_pad = np.pad(A_prev, [(padding, padding), (padding, padding),
                                 (0, 0)])
    dA_prev_pad = np.pad(dA_prev, [(padding, padding), (padding, padding),
                                   (0, 0)])

    for i_h in range(h_o):
        for i_w in range(w_o):
            for i_c in range(c_o):
                h_lower = i_h * stride
                h_upper = i_h * stride + f
                w_lower = i_w * stride
                w_upper = i_w * stride + f

                input_slice = A_prev_pad[h_lower:h_upper, w_lower:w_upper, :]
                # forward
                # kernel_slice = W[i_c]
                # Z[i_h, i_w, i_c] = np.sum(input_slice * kernel_slice)
                # Z[i_h, i_w, i_c] += b[i_c]

                # backward
                dW[i_c] += input_slice * dZ[i_h, i_w, i_c]
                dA_prev_pad[h_lower:h_upper,
                            w_lower:w_upper, :] += W[i_c] * dZ[i_h, i_w, i_c]
                db[i_c] += dZ[i_h, i_w, i_c]

    if padding > 0:
        dA_prev = dA_prev_pad[padding:-padding, padding:-padding, :]
    else:
        dA_prev = dA_prev_pad
    return dW, db, dA_prev


@pytest.mark.parametrize('c_i, c_o', [(3, 6), (2, 2)])
@pytest.mark.parametrize('kernel_size', [3, 5])
@pytest.mark.parametrize('stride', [1, 2])
@pytest.mark.parametrize('padding', [0, 1])
def test_conv(c_i: int, c_o: int, kernel_size: int, stride: int, padding: str):

    # Preprocess
    input = np.random.randn(20, 20, c_i)
    weight = np.random.randn(c_o, kernel_size, kernel_size, c_i)
    bias = np.random.randn(c_o)

    torch_input = torch.from_numpy(np.transpose(
        input, (2, 0, 1))).unsqueeze(0).requires_grad_()
    torch_weight = torch.from_numpy(np.transpose(
        weight, (0, 3, 1, 2))).requires_grad_()
    torch_bias = torch.from_numpy(bias).requires_grad_()

    # forward
    torch_output_tensor = torch.conv2d(torch_input, torch_weight, torch_bias,
                                       stride, padding)
    torch_output = np.transpose(
        torch_output_tensor.detach().numpy().squeeze(0), (1, 2, 0))

    cache = conv2d_forward(input, weight, bias, stride, padding)
    numpy_output = cache['Z']

    assert np.allclose(torch_output, numpy_output)

    # backward
    torch_sum = torch.sum(torch_output_tensor)
    torch_sum.backward()
    torch_dW = np.transpose(torch_weight.grad.numpy(), (0, 2, 3, 1))
    torch_db = torch_bias.grad.numpy()
    torch_dA_prev = np.transpose(torch_input.grad.numpy().squeeze(0),
                                 (1, 2, 0))

    dZ = np.ones(numpy_output.shape)
    dW, db, dA_prev = conv2d_backward(dZ, cache, stride, padding)

    assert np.allclose(dW, torch_dW)
    assert np.allclose(db, torch_db)
    assert np.allclose(dA_prev, torch_dA_prev)

池化反向传播：

import numpy as np
import torch.nn as nn
 
 
class MaxPooling(nn.Module):
    def __init__(self, ksize=2, stride=2):
        super(MaxPooling,self).__init__()
        self.ksize = ksize
        self.stride = stride 
 
    def forward(self, x):
        n,c,h,w = x.shape
        out = np.zeros([n, c, h//self.stride,w//self.stride])
        self.index = np.zeros_like(x)
        for b in range(n):
            for d in range(c):
                for i in range(h//self.stride):
                    for j in range(w//self.stride):
                        _x = i*self.stride
                        _y = j*self.stride
                        out[b, d ,i , j] = np.max(
                            x[b, d ,_x:_x+self.ksize, _y:_y+self.ksize])
                        index = np.argmax(x[b, d ,_x:_x+self.ksize, _y:_y+self.ksize])
                        self.index[b,d,_x+index//self.ksize, _y+index%self.ksize] = 1
        return out
 
    def backward(self, grad_out):
        return np.repeat(np.repeat(grad_out, self.stride, axis=2), self.stride, axis=3) * self.index