深度学习之卷积神经网络尺寸计算

通常我们在设计或者使用模型网络结构的时候,每一层的卷积数,核心大小和数量等等原因都有他们的作用和有意义,今天就来分享一下基础,怎么计算卷积神经网络的输入输出尺寸,这有利于在后面自己设计网络时候用到(话说回来,凭什么你自己的设计能比得上一个团队的设计)。

一、卷积神经网络尺寸计算基础

1.1 卷积层输出尺寸公式

卷积层的输出尺寸由以下公式决定:

H_out = floor((H_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)
W_out = floor((W_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)

其中:

  • H_in/W_in: 输入特征图的高度/宽度
  • padding: 填充像素数
  • dilation: 空洞卷积的膨胀率
  • kernel_size: 卷积核尺寸
  • stride: 卷积步长

1.2 池化层输出尺寸

池化层输出尺寸计算公式与卷积层相同,通常:

  • kernel_size = 2
  • stride = 2
  • padding = 0

二、ResNet18网络结构分析

ResNet18由以下主要组件构成:

  1. 初始卷积层
  2. 4个残差阶段(每个阶段包含2个BasicBlock)
  3. 全局平均池化
  4. 全连接层

2.1 BasicBlock结构

class BasicBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                              stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                              stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                         stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
            
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        return F.relu(out)

三、ResNet18完整计算流程

import torch
import torch.nn as nn

class BasicBlock(nn.Module):
    expansion = 1
    
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                              stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)
        return out

class ResNet18(nn.Module):
    def __init__(self, num_classes=1000):
        super().__init__()
        self.in_channels = 64
        
        # 初始卷积层
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # 4个残差阶段
        self.layer1 = self._make_layer(64, 64, blocks=2, stride=1)
        self.layer2 = self._make_layer(64, 128, blocks=2, stride=2)
        self.layer3 = self._make_layer(128, 256, blocks=2, stride=2)
        self.layer4 = self._make_layer(256, 512, blocks=2, stride=2)
        
        # 分类头
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * BasicBlock.expansion, num_classes)

    def _make_layer(self, in_channels, out_channels, blocks, stride):
        downsample = None
        if stride != 1 or in_channels != out_channels * BasicBlock.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels * BasicBlock.expansion, 
                         kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * BasicBlock.expansion)
            )
            
        layers = []
        layers.append(BasicBlock(in_channels, out_channels, stride, downsample))
        for _ in range(1, blocks):
            layers.append(BasicBlock(out_channels * BasicBlock.expansion, out_channels))
            
        return nn.Sequential(*layers)

    def forward(self, x):
        # 初始卷积
        x = self.conv1(x)    # [3,224,224] -> [64,112,112]
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)  # [64,112,112] -> [64,56,56]
        
        # 残差阶段
        x = self.layer1(x)   # [64,56,56] -> [64,56,56]
        x = self.layer2(x)   # [64,56,56] -> [128,28,28]
        x = self.layer3(x)   # [128,28,28] -> [256,14,14]
        x = self.layer4(x)   # [256,14,14] -> [512,7,7]
        
        # 分类
        x = self.avgpool(x)  # [512,7,7] -> [512,1,1]
        x = torch.flatten(x, 1)  # [512,1,1] -> [512]
        x = self.fc(x)       # [512] -> [num_classes]
        
        return x

假设输入图像尺寸为224×224×3:
带入公式:

H_out = floor((H_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)
W_out = floor((W_in + 2*padding - dilation*(kernel_size-1) -1)/stride + 1)

3.1 初始卷积层

self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)

计算:

H_out = floor((224 + 2*3 - 1*(7-1) -1)/2 + 1) = floor(112.5) = 112
W_out = floor((224 + 2*3 - 1*(7-1) -1)/2 + 1) = 112

输出尺寸:112×112×64

3.2 MaxPool层

maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

计算:

H_out = floor((112 + 2*1 - 1*(3-1) -1)/2 + 1) = floor(56.5) = 56  
W_out = 56

输出尺寸:56×56×64

3.3 残差阶段1

包含2个BasicBlock,不改变尺寸:

  • conv1: 3×3, stride=1
  • conv2: 3×3, stride=1

输出尺寸保持:56×56×64

3.4 残差阶段2

第一个BasicBlock下采样:

BasicBlock(64, 128, stride=2)

计算主路径:

  1. conv1: stride=2
    H_out = floor((56 + 2*1 - 1*(3-1) -1)/2 + 1) = 28
    
  2. conv2: stride=1
    H_out = floor((28 + 2*1 - 1*(3-1) -1)/1 + 1) = 28
    

shortcut路径:

nn.Conv2d(64, 128, kernel_size=1, stride=2)

输出尺寸:28×28×128

3.5 残差阶段3

同样方式下采样:

BasicBlock(128, 256, stride=2)

输出尺寸:14×14×256

3.6 残差阶段4

BasicBlock(256, 512, stride=2) 

输出尺寸:7×7×512

3.7 全局平均池化

nn.AdaptiveAvgPool2d((1,1))

输出尺寸:1×1×512

3.8 全连接层

nn.Linear(512, num_classes)

最终输出:num_classes维向量

四、完整尺寸变化流程

层名称类型参数输出尺寸
输入图像--224×224×3
初始卷积Conv2dk=7,s=2,p=3112×112×64
MaxPoolMaxPool2dk=3,s=2,p=156×56×64
残差阶段12×BasicBlock-56×56×64
残差阶段2BasicBlockstride=228×28×128
残差阶段3BasicBlockstride=214×14×256
残差阶段4BasicBlockstride=27×7×512
全局平均池化AdaptiveAvgPool-1×1×512
全连接层Linear-num_classes

五 另附参数量和计算量(FLOPs)计算

参数量计算

对于卷积层:

Params = (kernel_w × kernel_h × in_channels + bias) × out_channels

其中bias为1(如果使用偏置)
对于全连接层:

Params = (in_features + bias) × out_features

ResNet18完整参数量计算

# 初始卷积层 (3->64)
conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
Params = (7×7×3 + 1)×64 = 9,472

# 第一阶段 (2个BasicBlock, 64->64)
# BasicBlock1:
conv1_1 = nn.Conv2d(64, 64, kernel_size=3, padding=1)  # (3×3×64 + 1)×64 = 36,928
conv2_1 = nn.Conv2d(64, 64, kernel_size=3, padding=1)   # (3×3×64 + 1)×64 = 36,928
# BasicBlock2: 同上
Params += 2 × (36,928 + 36,928) = 147,456

# 第二阶段 (2个BasicBlock, 64->128)
# BasicBlock1 (带下采样):
conv1_2 = nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1)  # (3×3×64 + 1)×128 = 73,728
conv2_2 = nn.Conv2d(128, 128, kernel_size=3, padding=1)           # (3×3×128 + 1)×128 = 147,456
downsample_2 = nn.Conv2d(64, 128, kernel_size=1, stride=2)       # (1×1×64 + 1)×128 = 8,320
# BasicBlock2:
Params += (73,728 + 147,456 + 8,320) + (147,456) = 376,832

# 第三阶段 (2个BasicBlock, 128->256)
# BasicBlock1 (带下采样):
conv1_3 = nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1)  # (3×3×128 + 1)×256 = 294,912
conv2_3 = nn.Conv2d(256, 256, kernel_size=3, padding=1)            # (3×3×256 + 1)×256 = 589,824
downsample_3 = nn.Conv2d(128, 256, kernel_size=1, stride=2)        # (1×1×128 + 1)×256 = 33,024
# BasicBlock2:
Params += (294,912 + 589,824 + 33,024) + (589,824) = 1,507,584

# 第四阶段 (2个BasicBlock, 256->512)
# BasicBlock1 (带下采样):
conv1_4 = nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1)  # (3×3×256 + 1)×512 = 1,179,648
conv2_4 = nn.Conv2d(512, 512, kernel_size=3, padding=1)            # (3×3×512 + 1)×512 = 2,359,296
downsample_4 = nn.Conv2d(256, 512, kernel_size=1, stride=2)        # (1×1×256 + 1)×512 = 132,096
# BasicBlock2:
Params += (1,179,648 + 2,359,296 + 132,096) + (2,359,296) = 6,030,336

# 全连接层
fc = nn.Linear(512, num_classes)  # (512 + 1)×num_classes
Params += 513 × num_classes  # 对于1000类分类: 513,000

将所有阶段相加:
初始卷积层: 9,472
第一阶段: 147,456
第二阶段: 376,832
第三阶段: 1,507,584
第四阶段: 6,030,336
全连接层: 513,000 (以1000类为例)

总计: 11,684,680 ≈ 11.7M

计算量(FLOPs)计算

对于卷积层:

FLOPs = out_h × out_w × kernel_w × kernel_h × in_channels × out_channels × 2
(乘加各算一次操作,所以×2)

全连接层:

FLOPs = in_features × out_features × 2

ResNet18 FLOPs计算示例:

# 初始卷积层 (输入224×224×3, 输出112×112×64)
FLOPs = 112×112×7×7×3×64×2 = 1.13G

# 第一个BasicBlock (输入56×56×64)
conv1 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
FLOPs += 56×56×3×3×64×64×2 = 231M
conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
FLOPs += 56×56×3×3×64×64×2 = 231M

# 下采样卷积 (输入56×56×64, 输出28×28×128)
downsample = nn.Conv2d(64, 128, kernel_size=1, stride=2)
FLOPs += 28×28×1×1×64×128×2 = 6.43M

# 全连接层
fc = nn.Linear(512, 1000)
FLOPs += 512×1000×2 = 1.02M

总计约3.6G FLOPs(输入224×224时),当然也可以用工具自动计算:

from torchsummary import summary
import torchvision

model = torchvision.models.resnet18()
summary(model, (3, 224, 224), device='cpu')

# 或者使用thop
from thop import profile
input = torch.randn(1, 3, 224, 224)
flops, params = profile(model, inputs=(input,))
print(f'FLOPs: {flops/1e9}G, Params: {params/1e6}M')

六、实际应用建议

  1. 输入尺寸调整:当改变输入大小时,需要重新计算各层输出尺寸
  2. 下采样策略:通常在每个残差阶段开始时进行下采样
  3. 尺寸验证:可以使用PyTorch的torchsummary工具验证网络各层输出
from torchsummary import summary
model = ResNet18()
summary(model, (3, 224, 224))

理解这些计算原理对于网络设计和调试至关重要,特别是在处理不同尺寸输入或修改网络结构时。

### 卷积神经网络中图像尺寸计算方法 卷积神经网络(Convolutional Neural Network, CNN)中的卷积层输出尺寸可以通过特定公式进行计算。以下是详细的计算过程以及公式的推导。 #### 输出尺寸计算公式 对于输入图像 \( W \times H \),经过卷积操作后的输出尺寸可以由以下公式得出: \[ W_{out} = \frac{W_{in} + 2P - K}{S} + 1 \] 其中: - \( W_{in} \): 输入特征图的宽度。 - \( P \): 填充(Padding)的数量,通常为0或指定值。 - \( K \): 卷积核的大小(即窗口大小),例如 \( 5 \times 5 \)[^2]。 - \( S \): 步幅(Stride),表示每次滑动的距离。 - \( W_{out} \): 输出特征图的宽度。 同理,高度方向上的输出尺寸也可以按照相同的方式计算: \[ H_{out} = \frac{H_{in} + 2P - K}{S} + 1 \] 如果步幅和填充参数不同,则需分别代入对应的数值。 #### 示例代码实现 下面是一个简单的 Python 实现来验证上述公式的结果: ```python def calculate_output_size(input_size, kernel_size, padding=0, stride=1): """ Calculate the output size of a convolution layer. Parameters: input_size (int): The width or height of the input feature map. kernel_size (int): Size of the convolution kernel. padding (int): Padding added to both sides of the input. stride (int): Stride length for sliding window. Returns: int: Output size after applying convolution operation. """ return ((input_size + 2 * padding - kernel_size) // stride) + 1 # Example usage input_width = 32 # Input image width/height kernel_size = 5 # Convolution kernel size padding = 2 # Zero-padding on each side stride = 1 # Sliding step output_width = calculate_output_size(input_width, kernel_size, padding, stride) print(f"Output Width: {output_width}") ``` 运行此代码会得到相应的输出尺寸结果。 #### 特殊情况讨论 当卷积核大小被简化描述为单个数字时(如 “5” 而不是 “5x5”),这实际上意味着该卷积核的高度和宽度相等。因此,在实际应用中,“5” 可以理解为正方形卷积核的一个边长。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值