通常我们在设计或者使用模型网络结构的时候,每一层的卷积数,核心大小和数量等等原因都有他们的作用和有意义,今天就来分享一下基础,怎么计算卷积神经网络的输入输出尺寸,这有利于在后面自己设计网络时候用到(话说回来,凭什么你自己的设计能比得上一个团队的设计)。
一、卷积神经网络尺寸计算基础
1.1 卷积层输出尺寸公式
卷积层的输出尺寸由以下公式决定:
H_out = floor((H_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)
W_out = floor((W_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)
其中:
H_in/W_in
: 输入特征图的高度/宽度padding
: 填充像素数dilation
: 空洞卷积的膨胀率kernel_size
: 卷积核尺寸stride
: 卷积步长
1.2 池化层输出尺寸
池化层输出尺寸计算公式与卷积层相同,通常:
- kernel_size = 2
- stride = 2
- padding = 0
二、ResNet18网络结构分析
ResNet18由以下主要组件构成:
- 初始卷积层
- 4个残差阶段(每个阶段包含2个BasicBlock)
- 全局平均池化
- 全连接层
2.1 BasicBlock结构
class BasicBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1):
super().__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1,
stride=stride, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x)
return F.relu(out)
三、ResNet18完整计算流程
import torch
import torch.nn as nn
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super().__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class ResNet18(nn.Module):
def __init__(self, num_classes=1000):
super().__init__()
self.in_channels = 64
# 初始卷积层
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# 4个残差阶段
self.layer1 = self._make_layer(64, 64, blocks=2, stride=1)
self.layer2 = self._make_layer(64, 128, blocks=2, stride=2)
self.layer3 = self._make_layer(128, 256, blocks=2, stride=2)
self.layer4 = self._make_layer(256, 512, blocks=2, stride=2)
# 分类头
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * BasicBlock.expansion, num_classes)
def _make_layer(self, in_channels, out_channels, blocks, stride):
downsample = None
if stride != 1 or in_channels != out_channels * BasicBlock.expansion:
downsample = nn.Sequential(
nn.Conv2d(in_channels, out_channels * BasicBlock.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * BasicBlock.expansion)
)
layers = []
layers.append(BasicBlock(in_channels, out_channels, stride, downsample))
for _ in range(1, blocks):
layers.append(BasicBlock(out_channels * BasicBlock.expansion, out_channels))
return nn.Sequential(*layers)
def forward(self, x):
# 初始卷积
x = self.conv1(x) # [3,224,224] -> [64,112,112]
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x) # [64,112,112] -> [64,56,56]
# 残差阶段
x = self.layer1(x) # [64,56,56] -> [64,56,56]
x = self.layer2(x) # [64,56,56] -> [128,28,28]
x = self.layer3(x) # [128,28,28] -> [256,14,14]
x = self.layer4(x) # [256,14,14] -> [512,7,7]
# 分类
x = self.avgpool(x) # [512,7,7] -> [512,1,1]
x = torch.flatten(x, 1) # [512,1,1] -> [512]
x = self.fc(x) # [512] -> [num_classes]
return x
假设输入图像尺寸为224×224×3:
带入公式:
H_out = floor((H_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)
W_out = floor((W_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)
3.1 初始卷积层
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
计算:
H_out = floor((224 + 2*3 - 1*(7-1) -1)/2 + 1) = floor(112.5) = 112
W_out = floor((224 + 2*3 - 1*(7-1) -1)/2 + 1) = 112
输出尺寸:112×112×64
3.2 MaxPool层
maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
计算:
H_out = floor((112 + 2*1 - 1*(3-1) -1)/2 + 1) = floor(56.5) = 56
W_out = 56
输出尺寸:56×56×64
3.3 残差阶段1
包含2个BasicBlock,不改变尺寸:
- conv1: 3×3, stride=1
- conv2: 3×3, stride=1
输出尺寸保持:56×56×64
3.4 残差阶段2
第一个BasicBlock下采样:
BasicBlock(64, 128, stride=2)
计算主路径:
- conv1: stride=2
H_out = floor((56 + 2*1 - 1*(3-1) -1)/2 + 1) = 28
- conv2: stride=1
H_out = floor((28 + 2*1 - 1*(3-1) -1)/1 + 1) = 28
shortcut路径:
nn.Conv2d(64, 128, kernel_size=1, stride=2)
输出尺寸:28×28×128
3.5 残差阶段3
同样方式下采样:
BasicBlock(128, 256, stride=2)
输出尺寸:14×14×256
3.6 残差阶段4
BasicBlock(256, 512, stride=2)
输出尺寸:7×7×512
3.7 全局平均池化
nn.AdaptiveAvgPool2d((1,1))
输出尺寸:1×1×512
3.8 全连接层
nn.Linear(512, num_classes)
最终输出:num_classes维向量
四、完整尺寸变化流程
层名称 | 类型 | 参数 | 输出尺寸 |
---|---|---|---|
输入图像 | - | - | 224×224×3 |
初始卷积 | Conv2d | k=7,s=2,p=3 | 112×112×64 |
MaxPool | MaxPool2d | k=3,s=2,p=1 | 56×56×64 |
残差阶段1 | 2×BasicBlock | - | 56×56×64 |
残差阶段2 | BasicBlock | stride=2 | 28×28×128 |
残差阶段3 | BasicBlock | stride=2 | 14×14×256 |
残差阶段4 | BasicBlock | stride=2 | 7×7×512 |
全局平均池化 | AdaptiveAvgPool | - | 1×1×512 |
全连接层 | Linear | - | num_classes |
五 另附参数量和计算量(FLOPs)计算
参数量计算
对于卷积层:
Params = (kernel_w × kernel_h × in_channels + bias) × out_channels
其中bias为1(如果使用偏置)
对于全连接层:
Params = (in_features + bias) × out_features
ResNet18完整参数量计算
# 初始卷积层 (3->64)
conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
Params = (7×7×3 + 1)×64 = 9,472
# 第一阶段 (2个BasicBlock, 64->64)
# BasicBlock1:
conv1_1 = nn.Conv2d(64, 64, kernel_size=3, padding=1) # (3×3×64 + 1)×64 = 36,928
conv2_1 = nn.Conv2d(64, 64, kernel_size=3, padding=1) # (3×3×64 + 1)×64 = 36,928
# BasicBlock2: 同上
Params += 2 × (36,928 + 36,928) = 147,456
# 第二阶段 (2个BasicBlock, 64->128)
# BasicBlock1 (带下采样):
conv1_2 = nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1) # (3×3×64 + 1)×128 = 73,728
conv2_2 = nn.Conv2d(128, 128, kernel_size=3, padding=1) # (3×3×128 + 1)×128 = 147,456
downsample_2 = nn.Conv2d(64, 128, kernel_size=1, stride=2) # (1×1×64 + 1)×128 = 8,320
# BasicBlock2:
Params += (73,728 + 147,456 + 8,320) + (147,456) = 376,832
# 第三阶段 (2个BasicBlock, 128->256)
# BasicBlock1 (带下采样):
conv1_3 = nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1) # (3×3×128 + 1)×256 = 294,912
conv2_3 = nn.Conv2d(256, 256, kernel_size=3, padding=1) # (3×3×256 + 1)×256 = 589,824
downsample_3 = nn.Conv2d(128, 256, kernel_size=1, stride=2) # (1×1×128 + 1)×256 = 33,024
# BasicBlock2:
Params += (294,912 + 589,824 + 33,024) + (589,824) = 1,507,584
# 第四阶段 (2个BasicBlock, 256->512)
# BasicBlock1 (带下采样):
conv1_4 = nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1) # (3×3×256 + 1)×512 = 1,179,648
conv2_4 = nn.Conv2d(512, 512, kernel_size=3, padding=1) # (3×3×512 + 1)×512 = 2,359,296
downsample_4 = nn.Conv2d(256, 512, kernel_size=1, stride=2) # (1×1×256 + 1)×512 = 132,096
# BasicBlock2:
Params += (1,179,648 + 2,359,296 + 132,096) + (2,359,296) = 6,030,336
# 全连接层
fc = nn.Linear(512, num_classes) # (512 + 1)×num_classes
Params += 513 × num_classes # 对于1000类分类: 513,000
将所有阶段相加:
初始卷积层: 9,472
第一阶段: 147,456
第二阶段: 376,832
第三阶段: 1,507,584
第四阶段: 6,030,336
全连接层: 513,000 (以1000类为例)
总计: 11,684,680 ≈ 11.7M
计算量(FLOPs)计算
对于卷积层:
FLOPs = out_h × out_w × kernel_w × kernel_h × in_channels × out_channels × 2
(乘加各算一次操作,所以×2)
全连接层:
FLOPs = in_features × out_features × 2
ResNet18 FLOPs计算示例:
# 初始卷积层 (输入224×224×3, 输出112×112×64)
FLOPs = 112×112×7×7×3×64×2 = 1.13G
# 第一个BasicBlock (输入56×56×64)
conv1 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
FLOPs += 56×56×3×3×64×64×2 = 231M
conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
FLOPs += 56×56×3×3×64×64×2 = 231M
# 下采样卷积 (输入56×56×64, 输出28×28×128)
downsample = nn.Conv2d(64, 128, kernel_size=1, stride=2)
FLOPs += 28×28×1×1×64×128×2 = 6.43M
# 全连接层
fc = nn.Linear(512, 1000)
FLOPs += 512×1000×2 = 1.02M
总计约3.6G FLOPs(输入224×224时),当然也可以用工具自动计算:
from torchsummary import summary
import torchvision
model = torchvision.models.resnet18()
summary(model, (3, 224, 224), device='cpu')
# 或者使用thop
from thop import profile
input = torch.randn(1, 3, 224, 224)
flops, params = profile(model, inputs=(input,))
print(f'FLOPs: {flops/1e9}G, Params: {params/1e6}M')
六、实际应用建议
- 输入尺寸调整:当改变输入大小时,需要重新计算各层输出尺寸
- 下采样策略:通常在每个残差阶段开始时进行下采样
- 尺寸验证:可以使用PyTorch的
torchsummary
工具验证网络各层输出
from torchsummary import summary
model = ResNet18()
summary(model, (3, 224, 224))
理解这些计算原理对于网络设计和调试至关重要,特别是在处理不同尺寸输入或修改网络结构时。