卷积层详解

最新推荐文章于 2024-07-19 13:52:31 发布

修修爪哇

最新推荐文章于 2024-07-19 13:52:31 发布

阅读量5.3k

点赞数 46

分类专栏：深度学习文章标签： cnn 人工智能神经网络 python 深度学习 pytorch pycharm

本文链接：https://blog.csdn.net/ss20211121/article/details/136636413

版权

深度学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

卷积神经网络

卷积神经网络（Convolutional Neural Network，CNN）是一种深度学习模型，主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。

卷积层是 CNN 的核心部分，它通过滤波器（Filter）对输入图像进行卷积操作，提取出图像的特征信息。卷积层通过多个不同的滤波器，可以提取出不同的特征信息，例如边缘、纹理和形状等。

CNN 通常使用反向传播算法进行训练，通过优化损失函数，不断调整网络参数，使其能够更好地拟合训练数据。在实际应用中，CNN 已经取得了许多优秀的成果，例如在图像分类、物体检测和语音识别等领域中，CNN 已经成为了一种主流的模型。

卷积层

本文主要讲最常用的Conv2d，二维图像

主要的数学公式：

⋆ 是有效的二维互相关运算符，N 是批量大小，C表示通道数，H 是输入平面的高度（以像素为单位），并且W 是以像素为单位的宽度。

$\mathrm{out}(N_i,C_{\mathrm{out}_j})=\mathrm{bias}(C_{\mathrm{out}_j})+\sum_{k=0}^{C_{\mathrm{in}-1}}\mathrm{weight}(C_{\mathrm{out}_j},k)\star\mathrm{input}(N_i,k)$

参数：

pytorch官网给出的框架中可设置的参数：

torch.nn.Conv2d(in_channels, 
                out_channels, 
                kernel_size, 
                stride=1, 
                padding=0, 
                dilation=1, 
                groups=1, 
                bias=True, 
                padding_mode='zeros', 
                device=None, 
                dtype=None)

参数的含义：

参数名称	含义
in_channels	输入的通道数，也就是输入图像的深度（channel）
out_channel	输出的通道数，也就是卷积核（滤波器）的个数，决定了卷积层的深度。
kernel_size	卷积核的大小，可以是一个整数或者一个元组 (H, W)，其中 H 和 W 分别表示卷积核的高度和宽度。
stride	卷积的步长，可以是一个整数或者一个元组 (S_H, S_W)，其中 S_H 和 S_W 分别表示在高度和宽度方向上的步长，默认值为 1。
padding	输入的零填充（zero-padding）的大小，可以是一个整数或者一个元组 (P_H, P_W)，其中 P_H 和 P_W 分别表示在高度和宽度方向上的填充大小，默认值为 0。
dilation	卷积核的扩展率（dilation rate），默认值为 1。如果设置为大于 1 的值，将会增加卷积核内部元素之间的间距，从而改变卷积操作的感受野大小。
groups	输入和输出之间的连接方式，可以是一个整数，默认值为 1。当 groups 等于输入通道数时，表示每个输入通道对应一个输出通道；当 groups 等于 1 时，表示所有输入通道共享一个卷积核。
bias	是否使用偏置项，默认值为 True。如果设置为 False，卷积操作中将不添加偏置项。
padding_mode	填充模式，默认为 ‘zeros’，表示使用零填充。
device	指定张量所在的设备（CPU 或 GPU）。
dtype	指定张量的数据类型。

示例代码：

import torch
import torchvision
from torch import nn
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWr

//加载CIFAR10数据集（自动下载）
dataset = torchvision.datasets.CIFAR10("../data",
                                       train=False,
                                       transform=torchvision.transforms.ToTensor(),
                                       download=True)
//加载数据集
dataloader = DataLoader(dataset, batch_size=64)

//创建Touch模型
class Touch(nn.Module):
    def __init__(self):
        super(Touch,self).__init__()
        //传入二维卷积所需要的参数
        self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x
//实例化Touch模型
touch = Touch()

//将下列卷积输出到日志文件中，并载入tensorboard模块可视化展示
writer = SummaryWriter("../../logs")

//初始化step
step = 0
//通过for循环将每一个卷积输出结果写入日志文件中
for data in dataloader:
    imgs, targets = data
    output = touch(imgs)

    //打印输出imgs，output的数据形状
    print(imgs.shape)
    print(output.shape)

    writer.add_images("input", imgs, step)

    //将输出的数据形状转化为模型所需要形状
    output = torch.reshape(output, (-1, 3, 30, 30))
    writer.add_images("output", output, step)

    //持续走步
    step = step + 1

将输出的日志文件传入tensorboard模块进行可视化展示：

tensorboard --logdir=logs

可视化展示：
在这里插入图片描述

框架源码：

class Conv2d(_ConvNd):
    __doc__ = r"""Applies a 2D convolution over an input signal composed of several input
    planes.

    In the simplest case, the output value of the layer with input size
    :math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})`
    can be precisely described as:

    .. math::
        \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
        \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)


    where :math:`\star` is the valid 2D `cross-correlation`_ operator,
    :math:`N` is a batch size, :math:`C` denotes a number of channels,
    :math:`H` is a height of input planes in pixels, and :math:`W` is
    width in pixels.
    """ + r"""

    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.

    On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward.

    * :attr:`stride` controls the stride for the cross-correlation, a single
      number or a tuple.

    * :attr:`padding` controls the amount of padding applied to the input. It
      can be either a string {{'valid', 'same'}} or an int / a tuple of ints giving the
      amount of implicit padding applied on both sides.

    * :attr:`dilation` controls the spacing between the kernel points; also
      known as the à trous algorithm. It is harder to describe, but this `link`_
      has a nice visualization of what :attr:`dilation` does.

    {groups_note}

    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:

        - a single ``int`` -- in which case the same value is used for the height and width dimension
        - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
          and the second `int` for the width dimension

    Note:
        {depthwise_separable_note}

    Note:
        {cudnn_reproducibility_note}

    Note:
        ``padding='valid'`` is the same as no padding. ``padding='same'`` pads
        the input so the output has the shape as the input. However, this mode
        doesn't support any stride values other than 1.

    Note:
        This module supports complex data types i.e. ``complex32, complex64, complex128``.

    Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the convolution
        kernel_size (int or tuple): Size of the convolving kernel
        stride (int or tuple, optional): Stride of the convolution. Default: 1
        padding (int, tuple or str, optional): Padding added to all four sides of
            the input. Default: 0
        padding_mode (str, optional): ``'zeros'``, ``'reflect'``,
            ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
        dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
        groups (int, optional): Number of blocked connections from input
            channels to output channels. Default: 1
        bias (bool, optional): If ``True``, adds a learnable bias to the
            output. Default: ``True``
    """.format(**reproducibility_notes, **convolution_notes) + r"""

    Shape:
        - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
        - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in}  + 2 \times \text{padding}[0] - \text{dilation}[0]
                        \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in}  + 2 \times \text{padding}[1] - \text{dilation}[1]
                        \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

    Attributes:
        weight (Tensor): the learnable weights of the module of shape
            :math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
            :math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
            The values of these weights are sampled from
            :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
        bias (Tensor):   the learnable bias of the module of shape
            (out_channels). If :attr:`bias` is ``True``,
            then the values of these weights are
            sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`

    Examples:

        >>> # With square kernels and equal stride
        >>> m = nn.Conv2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
        >>> # non-square kernels and unequal stride and with padding and dilation
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
        >>> input = torch.randn(20, 16, 50, 100)
        >>> output = m(input)

    .. _cross-correlation:
        https://en.wikipedia.org/wiki/Cross-correlation

    .. _link:
        https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: _size_2_t,
        stride: _size_2_t = 1,
        padding: Union[str, _size_2_t] = 0,
        dilation: _size_2_t = 1,
        groups: int = 1,
        bias: bool = True,
        padding_mode: str = 'zeros',  # TODO: refine this type
        device=None,
        dtype=None
    ) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        kernel_size_ = _pair(kernel_size)
        stride_ = _pair(stride)
        padding_ = padding if isinstance(padding, str) else _pair(padding)
        dilation_ = _pair(dilation)
        super().__init__(
            in_channels, out_channels, kernel_size_, stride_, padding_, dilation_,
            False, _pair(0), groups, bias, padding_mode, **factory_kwargs)

    def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
        if self.padding_mode != 'zeros':
            return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
                            weight, bias, self.stride,
                            _pair(0), self.dilation, self.groups)
        return F.conv2d(input, weight, bias, self.stride,
                        self.padding, self.dilation, self.groups)

    def forward(self, input: Tensor) -> Tensor:
        return self._conv_forward(input, self.weight, self.bias)