【PyTorch】torch.nn.Conv2d 类：二维卷积层（2D Convolutional Layer）

最新推荐文章于 2025-04-28 16:07:48 发布

彬彬侠

最新推荐文章于 2025-04-28 16:07:48 发布

阅读量1.8k

点赞数 19

分类专栏： PyTorch基础文章标签： Conv2d 二维卷积层 CNN pytorch 机器学习 python

本文链接：https://blog.csdn.net/u013172930/article/details/146287322

版权

PyTorch基础专栏收录该内容

101 篇文章

订阅专栏

`torch.nn.Conv2d`

torch.nn.Conv2d 是 PyTorch 二维卷积层（2D Convolutional Layer） 的实现，主要用于 计算机视觉任务（如图像分类、目标检测等），可以提取 空间特征 并 增强模型的表示能力。

1. `torch.nn.Conv2d` 语法

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

参数	说明
`in_channels`	输入通道数（灰度图像=1，RGB图像=3）
`out_channels`	卷积核的数量（输出通道数）
`kernel_size`	卷积核大小（如 `3` 或 `(3,3)`）
`stride`	步长（默认 `1`，控制卷积核滑动的步幅）
`padding`	填充（如 `1`，保持输入输出尺寸一致）
`dilation`	空洞卷积（控制卷积核的扩张间隔）
`groups`	组卷积（`groups=1` 为标准卷积）
`bias`	是否使用偏置项（默认 `True`）
`padding_mode`	填充模式（`zeros`, `reflect`, `replicate`, `circular`）

2. 示例：定义基本 `Conv2d`

import torch
import torch.nn as nn

# 定义 2D 卷积层
conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)

# 假设输入为 (batch_size=1, channels=3, height=32, width=32)
input_tensor = torch.randn(1, 3, 32, 32)

# 计算输出
output = conv(input_tensor)
print(output.shape)  # 输出: torch.Size([1, 16, 32, 32])

解析

in_channels=3（输入通道数，RGB 图像）。
out_channels=16（16 个卷积核，输出通道数）。
kernel_size=3（使用 3x3 卷积核）。
padding=1（保持尺寸不变，即 H_out = H_in）。
stride=1（步长为 1，逐像素滑动）。
输出形状 (1, 16, 32, 32)，表示：
- 1 是 batch_size。
- 16 是输出通道数（16 个卷积核）。
- 32 × 32 是特征图大小。

3. 计算卷积输出尺寸

卷积输出尺寸计算公式：
$H_{\text{out}} = \frac{H_{\text{in}} + 2P - K}{S} + 1$
$W_{\text{out}} = \frac{W_{\text{in}} + 2P - K}{S} + 1$
其中：

$H_{\text{in}}, W_{\text{in}}$ 是输入特征图的高宽。
$P$ 是填充（padding）。
$K$ 是卷积核大小（kernel_size）。
$S$ 是步长（stride）。

4. `padding` 和 `stride` 的作用

示例 1：默认 `padding=0`

conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=0)
input_tensor = torch.randn(1, 3, 32, 32)
output = conv(input_tensor)
print(output.shape)  # torch.Size([1, 16, 30, 30])

由于 padding=0，输出尺寸变小。

示例 2：`padding=1`（保持尺寸不变）

conv = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
input_tensor = torch.randn(1, 3, 32, 32)
output = conv(input_tensor)
print(output.shape)  # torch.Size([1, 16, 32, 32])

padding=1 确保输入输出 尺寸相同。

示例 3：`stride=2`（步长 2，降采样）

conv = nn.Conv2d(3, 16, kernel_size=3, stride=2, padding=1)
input_tensor = torch.randn(1, 3, 32, 32)
output = conv(input_tensor)
print(output.shape)  # torch.Size([1, 16, 16, 16])

stride=2 使输出尺寸 缩小为 1/2。

5. `dilation`（空洞卷积）

dilation 控制卷积核的 扩张间隔，适用于 感受野增大 的任务（如目标检测）。

conv = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=2, dilation=2)
input_tensor = torch.randn(1, 3, 32, 32)
output = conv(input_tensor)
print(output.shape)  # torch.Size([1, 16, 32, 32])

dilation=2 扩大了感受野，但 保持输出尺寸不变。

6. `groups`（组卷积）

groups > 1 用于 分组卷积（Grouped Convolution），如 MobileNet。

conv = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, groups=2)
input_tensor = torch.randn(1, 8, 32, 32)
output = conv(input_tensor)
print(output.shape)  # torch.Size([1, 16, 30, 30])