卷积层详解

卷积神经网络

卷积神经网络(Convolutional Neural Network,CNN)是一种深度学习模型,主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。

卷积层是 CNN 的核心部分,它通过滤波器(Filter)对输入图像进行卷积操作,提取出图像的特征信息。卷积层通过多个不同的滤波器,可以提取出不同的特征信息,例如边缘、纹理和形状等。

CNN 通常使用反向传播算法进行训练,通过优化损失函数,不断调整网络参数,使其能够更好地拟合训练数据。在实际应用中,CNN 已经取得了许多优秀的成果,例如在图像分类、物体检测和语音识别等领域中,CNN 已经成为了一种主流的模型。

卷积层

本文主要讲最常用的Conv2d,二维图像

主要的数学公式:

⋆ 是有效的二维互相关运算符,N 是批量大小,C表示通道数,H 是输入平面的高度(以像素为单位),并且W 是以像素为单位的宽度。

o u t ( N i , C o u t j ) = b i a s ( C o u t j ) + ∑ k = 0 C i n − 1 w e i g h t ( C o u t j , k ) ⋆ i n p u t ( N i , k ) \mathrm{out}(N_i,C_{\mathrm{out}_j})=\mathrm{bias}(C_{\mathrm{out}_j})+\sum_{k=0}^{C_{\mathrm{in}-1}}\mathrm{weight}(C_{\mathrm{out}_j},k)\star\mathrm{input}(N_i,k) out(Ni,Coutj)=bias(Coutj)+k=0Cin1weight(Coutj,k)input(Ni,k)

参数:

pytorch官网给出的框架中可设置的参数:

torch.nn.Conv2d(in_channels, 
                out_channels, 
                kernel_size, 
                stride=1, 
                padding=0, 
                dilation=1, 
                groups=1, 
                bias=True, 
                padding_mode='zeros', 
                device=None, 
                dtype=None)

参数的含义:

参数名称含义
in_channels输入的通道数,也就是输入图像的深度(channel)
out_channel输出的通道数,也就是卷积核(滤波器)的个数,决定了卷积层的深度。
kernel_size卷积核的大小,可以是一个整数或者一个元组 (H, W),其中 H 和 W 分别表示卷积核的高度和宽度。
stride卷积的步长,可以是一个整数或者一个元组 (S_H, S_W),其中 S_H 和 S_W 分别表示在高度和宽度方向上的步长,默认值为 1。
padding输入的零填充(zero-padding)的大小,可以是一个整数或者一个元组 (P_H, P_W),其中 P_H 和 P_W 分别表示在高度和宽度方向上的填充大小,默认值为 0。
dilation卷积核的扩展率(dilation rate),默认值为 1。如果设置为大于 1 的值,将会增加卷积核内部元素之间的间距,从而改变卷积操作的感受野大小。
groups输入和输出之间的连接方式,可以是一个整数,默认值为 1。当 groups 等于输入通道数时,表示每个输入通道对应一个输出通道;当 groups 等于 1 时,表示所有输入通道共享一个卷积核。
bias是否使用偏置项,默认值为 True。如果设置为 False,卷积操作中将不添加偏置项。
padding_mode填充模式,默认为 ‘zeros’,表示使用零填充。
device指定张量所在的设备(CPU 或 GPU)。
dtype指定张量的数据类型。
示例代码:
import torch
import torchvision
from torch import nn
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWr

//加载CIFAR10数据集(自动下载)
dataset = torchvision.datasets.CIFAR10("../data",
                                       train=False,
                                       transform=torchvision.transforms.ToTensor(),
                                       download=True)
//加载数据集
dataloader = DataLoader(dataset, batch_size=64)

//创建Touch模型
class Touch(nn.Module):
    def __init__(self):
        super(Touch,self).__init__()
        //传入二维卷积所需要的参数
        self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x
//实例化Touch模型
touch = Touch()

//将下列卷积输出到日志文件中,并载入tensorboard模块可视化展示
writer = SummaryWriter("../../logs")

//初始化step
step = 0
//通过for循环将每一个卷积输出结果写入日志文件中
for data in dataloader:
    imgs, targets = data
    output = touch(imgs)

    //打印输出imgs,output的数据形状
    print(imgs.shape)
    print(output.shape)

    writer.add_images("input", imgs, step)

    //将输出的数据形状转化为模型所需要形状
    output = torch.reshape(output, (-1, 3, 30, 30))
    writer.add_images("output", output, step)

    //持续走步
    step = step + 1

将输出的日志文件传入tensorboard模块进行可视化展示:

tensorboard --logdir=logs

可视化展示:
在这里插入图片描述

框架源码:
class Conv2d(_ConvNd):
    __doc__ = r"""Applies a 2D convolution over an input signal composed of several input
    planes.

    In the simplest case, the output value of the layer with input size
    :math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})`
    can be precisely described as:

    .. math::
        \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
        \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)


    where :math:`\star` is the valid 2D `cross-correlation`_ operator,
    :math:`N` is a batch size, :math:`C` denotes a number of channels,
    :math:`H` is a height of input planes in pixels, and :math:`W` is
    width in pixels.
    """ + r"""

    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.

    On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward.

    * :attr:`stride` controls the stride for the cross-correlation, a single
      number or a tuple.

    * :attr:`padding` controls the amount of padding applied to the input. It
      can be either a string {{'valid', 'same'}} or an int / a tuple of ints giving the
      amount of implicit padding applied on both sides.

    * :attr:`dilation` controls the spacing between the kernel points; also
      known as the à trous algorithm. It is harder to describe, but this `link`_
      has a nice visualization of what :attr:`dilation` does.

    {groups_note}

    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:

        - a single ``int`` -- in which case the same value is used for the height and width dimension
        - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
          and the second `int` for the width dimension

    Note:
        {depthwise_separable_note}

    Note:
        {cudnn_reproducibility_note}

    Note:
        ``padding='valid'`` is the same as no padding. ``padding='same'`` pads
        the input so the output has the shape as the input. However, this mode
        doesn't support any stride values other than 1.

    Note:
        This module supports complex data types i.e. ``complex32, complex64, complex128``.

    Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the convolution
        kernel_size (int or tuple): Size of the convolving kernel
        stride (int or tuple, optional): Stride of the convolution. Default: 1
        padding (int, tuple or str, optional): Padding added to all four sides of
            the input. Default: 0
        padding_mode (str, optional): ``'zeros'``, ``'reflect'``,
            ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
        dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
        groups (int, optional): Number of blocked connections from input
            channels to output channels. Default: 1
        bias (bool, optional): If ``True``, adds a learnable bias to the
            output. Default: ``True``
    """.format(**reproducibility_notes, **convolution_notes) + r"""

    Shape:
        - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
        - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in}  + 2 \times \text{padding}[0] - \text{dilation}[0]
                        \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in}  + 2 \times \text{padding}[1] - \text{dilation}[1]
                        \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

    Attributes:
        weight (Tensor): the learnable weights of the module of shape
            :math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
            :math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
            The values of these weights are sampled from
            :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
        bias (Tensor):   the learnable bias of the module of shape
            (out_channels). If :attr:`bias` is ``True``,
            then the values of these weights are
            sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`

    Examples:

        >>> # With square kernels and equal stride
        >>> m = nn.Conv2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
        >>> # non-square kernels and unequal stride and with padding and dilation
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
        >>> input = torch.randn(20, 16, 50, 100)
        >>> output = m(input)

    .. _cross-correlation:
        https://en.wikipedia.org/wiki/Cross-correlation

    .. _link:
        https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: _size_2_t,
        stride: _size_2_t = 1,
        padding: Union[str, _size_2_t] = 0,
        dilation: _size_2_t = 1,
        groups: int = 1,
        bias: bool = True,
        padding_mode: str = 'zeros',  # TODO: refine this type
        device=None,
        dtype=None
    ) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        kernel_size_ = _pair(kernel_size)
        stride_ = _pair(stride)
        padding_ = padding if isinstance(padding, str) else _pair(padding)
        dilation_ = _pair(dilation)
        super().__init__(
            in_channels, out_channels, kernel_size_, stride_, padding_, dilation_,
            False, _pair(0), groups, bias, padding_mode, **factory_kwargs)

    def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
        if self.padding_mode != 'zeros':
            return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
                            weight, bias, self.stride,
                            _pair(0), self.dilation, self.groups)
        return F.conv2d(input, weight, bias, self.stride,
                        self.padding, self.dilation, self.groups)

    def forward(self, input: Tensor) -> Tensor:
        return self._conv_forward(input, self.weight, self.bias)
  • 46
    点赞
  • 58
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值