pytorch2ONNX时,AdaptiveAvgPool2d的相关问题

1、torchvision.models.vgg11_bn

from torchsummary import summary
import torch
from torchvision import models


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = models.vgg11_bn(num_classes=2).to(device)

# 打印模型结构
backbone1 = summary(model, (3, 128, 128))
backbone2 = summary(model, (3, 224, 224))
  • 当图片输入尺寸为:(3, 224, 224),模型的输出结构如下:
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
       BatchNorm2d-2         [-1, 64, 224, 224]             128
              ReLU-3         [-1, 64, 224, 224]               0
         MaxPool2d-4         [-1, 64, 112, 112]               0
            Conv2d-5        [-1, 128, 112, 112]          73,856
       BatchNorm2d-6        [-1, 128, 112, 112]             256
              ReLU-7        [-1, 128, 112, 112]               0
         MaxPool2d-8          [-1, 128, 56, 56]               0
            Conv2d-9          [-1, 256, 56, 56]         295,168
      BatchNorm2d-10          [-1, 256, 56, 56]             512
             ReLU-11          [-1, 256, 56, 56]               0
           Conv2d-12          [-1, 256, 56, 56]         590,080
      BatchNorm2d-13          [-1, 256, 56, 56]             512
             ReLU-14          [-1, 256, 56, 56]               0
        MaxPool2d-15          [-1, 256, 28, 28]               0
           Conv2d-16          [-1, 512, 28, 28]       1,180,160
      BatchNorm2d-17          [-1, 512, 28, 28]           1,024
             ReLU-18          [-1, 512, 28, 28]               0
           Conv2d-19          [-1, 512, 28, 28]       2,359,808
      BatchNorm2d-20          [-1, 512, 28, 28]           1,024
             ReLU-21          [-1, 512, 28, 28]               0
        MaxPool2d-22          [-1, 512, 14, 14]               0
           Conv2d-23          [-1, 512, 14, 14]       2,359,808
      BatchNorm2d-24          [-1, 512, 14, 14]           1,024
             ReLU-25          [-1, 512, 14, 14]               0
           Conv2d-26          [-1, 512, 14, 14]       2,359,808
      BatchNorm2d-27          [-1, 512, 14, 14]           1,024
             ReLU-28          [-1, 512, 14, 14]               0
        MaxPool2d-29            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-30            [-1, 512, 7, 7]               0
           Linear-31                 [-1, 4096]     102,764,544
             ReLU-32                 [-1, 4096]               0
          Dropout-33                 [-1, 4096]               0
           Linear-34                 [-1, 4096]      16,781,312
             ReLU-35                 [-1, 4096]               0
          Dropout-36                 [-1, 4096]               0
           Linear-37                    [-1, 2]           8,194
================================================================
Total params: 128,780,034
Trainable params: 128,780,034
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 182.02
Params size (MB): 491.26
Estimated Total Size (MB): 673.85
----------------------------------------------------------------
  • 当图片输入尺寸为:(3, 128, 128),模型的输出结构如下:
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 128, 128]           1,792
       BatchNorm2d-2         [-1, 64, 128, 128]             128
              ReLU-3         [-1, 64, 128, 128]               0
         MaxPool2d-4           [-1, 64, 64, 64]               0
            Conv2d-5          [-1, 128, 64, 64]          73,856
       BatchNorm2d-6          [-1, 128, 64, 64]             256
              ReLU-7          [-1, 128, 64, 64]               0
         MaxPool2d-8          [-1, 128, 32, 32]               0
            Conv2d-9          [-1, 256, 32, 32]         295,168
      BatchNorm2d-10          [-1, 256, 32, 32]             512
             ReLU-11          [-1, 256, 32, 32]               0
           Conv2d-12          [-1, 256, 32, 32]         590,080
      BatchNorm2d-13          [-1, 256, 32, 32]             512
             ReLU-14          [-1, 256, 32, 32]               0
        MaxPool2d-15          [-1, 256, 16, 16]               0
           Conv2d-16          [-1, 512, 16, 16]       1,180,160
      BatchNorm2d-17          [-1, 512, 16, 16]           1,024
             ReLU-18          [-1, 512, 16, 16]               0
           Conv2d-19          [-1, 512, 16, 16]       2,359,808
      BatchNorm2d-20          [-1, 512, 16, 16]           1,024
             ReLU-21          [-1, 512, 16, 16]               0
        MaxPool2d-22            [-1, 512, 8, 8]               0
           Conv2d-23            [-1, 512, 8, 8]       2,359,808
      BatchNorm2d-24            [-1, 512, 8, 8]           1,024
             ReLU-25            [-1, 512, 8, 8]               0
           Conv2d-26            [-1, 512, 8, 8]       2,359,808
      BatchNorm2d-27            [-1, 512, 8, 8]           1,024
             ReLU-28            [-1, 512, 8, 8]               0
        MaxPool2d-29            [-1, 512, 4, 4]               0
AdaptiveAvgPool2d-30            [-1, 512, 7, 7]               0
           Linear-31                 [-1, 4096]     102,764,544
             ReLU-32                 [-1, 4096]               0
          Dropout-33                 [-1, 4096]               0
           Linear-34                 [-1, 4096]      16,781,312
             ReLU-35                 [-1, 4096]               0
          Dropout- ==**当图片输入尺寸为:(3, 128, 128),模型的输出结构如下:**==
-36                 [-1, 4096]               0
           Linear-37                    [-1, 2]           8,194
================================================================
Total params: 128,780,034
Trainable params: 128,780,034
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 59.69
Params size (MB): 491.26
Estimated Total Size (MB): 551.14
----------------------------------------------------------------

2、对比发现

  • 两种尺寸图片都可以正常训练,224是torch官方使用的尺寸,训练imagenet训练,并且提供了训练权重。当模型属输入尺寸为128时,仍然可以使用预训练权重,但是可以看到MaxPool2d到AdaptiveAvgPool2d时,输出尺寸发生了变化,这是因为AdaptiveAvgPool2d可以动态的调整输入尺寸的大小和stride。可以更好的适应不同的输入出尺寸。
  • 在训练模型时,这些层在前向传播过程中可以产生输出,但是在反向传播过程中并不影响梯度的计算。因此,尽管在导出到ONNX格式时可能会遇到一些限制或错误,但模型仍然可以继续通过梯度下降算法进行训练。

3、结论

  • 在ONNX导出时,当AdaptiveAvgPool2d的输入尺寸和输出尺寸不对应时,就会提示错误:
raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator adaptive_avg_pool2d, output size that are not factor of input size. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues  [Caused by the value '100 defined in (%100 : Long(2, strides=[1], device=cpu) = onnx::Constant[value= 7  7 [ CPULongType{2} ]]()
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Constant'.] 
  • 所以需要修改AdaptiveAvgPool2d,以正确的导出模型
  • 在导出onnx的过程中,很多动态层都不支持,需要改为固定输出
  • 在 PyTorch 的模型中,通常在使用 torchsummarysummary 函数时,如果没有指定 batch size,它会默认使用一个 batch size 为 2 来生成模型的 summary。这是因为在实际训练和推理过程中,通常会使用 mini-batch 处理数据,而选择 batch size 为 2 是一种常见的默认设置。

因此,当你调用 summary(model, (3, 128, 128)) 时,torchsummary 库会假定 batch size 为 2,然后将输入尺寸 (3, 128, 128) 传递给模型,以便计算模型的结构和参数数量。

如果你希望使用不同的 batch size,可以在调用 summary 函数时显式指定,例如:

backbone1 = summary(model, input_size=(3, 128, 128), batch_size=4)

通过提供 batch_size 参数,你可以自定义用于生成 summary 的 batch size。这样可以更好地了解模型在不同 batch size 下的行为和参数量。

4、模型修改

from torchsummary import summary
import torch
from torchvision import models
from torch import nn


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = models.vgg11_bn(num_classes=2)
model.avgpool = nn.AdaptiveAvgPool2d((4, 4))
model.classifier = nn.Sequential(
            nn.Linear(512 * 4 * 4, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 2),
        )
model.to(device)   # 模型修改之后,再搞到GPU上,不然报错

# 打印模型结构
backbone1 = summary(model, (3, 128, 128))
backbone2 = summary(model, (3, 224, 224))
  • 当图片输入尺寸为:(3, 224, 224),模型的输出结构如下:
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
       BatchNorm2d-2         [-1, 64, 224, 224]             128
              ReLU-3         [-1, 64, 224, 224]               0
         MaxPool2d-4         [-1, 64, 112, 112]               0
            Conv2d-5        [-1, 128, 112, 112]          73,856
       BatchNorm2d-6        [-1, 128, 112, 112]             256
              ReLU-7        [-1, 128, 112, 112]               0
         MaxPool2d-8          [-1, 128, 56, 56]               0
            Conv2d-9          [-1, 256, 56, 56]         295,168
      BatchNorm2d-10          [-1, 256, 56, 56]             512
             ReLU-11          [-1, 256, 56, 56]               0
           Conv2d-12          [-1, 256, 56, 56]         590,080
      BatchNorm2d-13          [-1, 256, 56, 56]             512
             ReLU-14          [-1, 256, 56, 56]               0
        MaxPool2d-15          [-1, 256, 28, 28]               0
           Conv2d-16          [-1, 512, 28, 28]       1,180,160
      BatchNorm2d-17          [-1, 512, 28, 28]           1,024
             ReLU-18          [-1, 512, 28, 28]               0
           Conv2d-19          [-1, 512, 28, 28]       2,359,808
      BatchNorm2d-20          [-1, 512, 28, 28]           1,024
             ReLU-21          [-1, 512, 28, 28]               0
        MaxPool2d-22          [-1, 512, 14, 14]               0
           Conv2d-23          [-1, 512, 14, 14]       2,359,808
      BatchNorm2d-24          [-1, 512, 14, 14]           1,024
             ReLU-25          [-1, 512, 14, 14]               0
           Conv2d-26          [-1, 512, 14, 14]       2,359,808
      BatchNorm2d-27          [-1, 512, 14, 14]           1,024
             ReLU-28          [-1, 512, 14, 14]               0
        MaxPool2d-29            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-30            [-1, 512, 4, 4]               0
           Linear-31                 [-1, 4096]      33,558,528
             ReLU-32                 [-1, 4096]               0
          Dropout-33                 [-1, 4096]               0
           Linear-34                 [-1, 4096]      16,781,312
             ReLU-35                 [-1, 4096]               0
          Dropout-36                 [-1, 4096]               0
           Linear-37                    [-1, 2]           8,194
================================================================
Total params: 59,574,018
Trainable params: 59,574,018
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 181.89
Params size (MB): 227.26
Estimated Total Size (MB): 409.73
----------------------------------------------------------------
  • 当图片输入尺寸为:(3, 128, 128),模型的输出结构如下:
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 128, 128]           1,792
       BatchNorm2d-2         [-1, 64, 128, 128]             128
              ReLU-3         [-1, 64, 128, 128]               0
         MaxPool2d-4           [-1, 64, 64, 64]               0
            Conv2d-5          [-1, 128, 64, 64]          73,856
       BatchNorm2d-6          [-1, 128, 64, 64]             256
              ReLU-7          [-1, 128, 64, 64]               0
         MaxPool2d-8          [-1, 128, 32, 32]               0
            Conv2d-9          [-1, 256, 32, 32]         295,168
      BatchNorm2d-10          [-1, 256, 32, 32]             512
             ReLU-11          [-1, 256, 32, 32]               0
           Conv2d-12          [-1, 256, 32, 32]         590,080
      BatchNorm2d-13          [-1, 256, 32, 32]             512
             ReLU-14          [-1, 256, 32, 32]               0
        MaxPool2d-15          [-1, 256, 16, 16]               0
           Conv2d-16          [-1, 512, 16, 16]       1,180,160
      BatchNorm2d-17          [-1, 512, 16, 16]           1,024
             ReLU-18          [-1, 512, 16, 16]               0
           Conv2d-19          [-1, 512, 16, 16]       2,359,808
      BatchNorm2d-20          [-1, 512, 16, 16]           1,024
             ReLU-21          [-1, 512, 16, 16]               0
        MaxPool2d-22            [-1, 512, 8, 8]               0
           Conv2d-23            [-1, 512, 8, 8]       2,359,808
      BatchNorm2d-24            [-1, 512, 8, 8]           1,024
             ReLU-25            [-1, 512, 8, 8]               0
           Conv2d-26            [-1, 512, 8, 8]       2,359,808
      BatchNorm2d-27            [-1, 512, 8, 8]           1,024
             ReLU-28            [-1, 512, 8, 8]               0
        MaxPool2d-29            [-1, 512, 4, 4]               0
AdaptiveAvgPool2d-30            [-1, 512, 4, 4]               0
           Linear-31                 [-1, 4096]      33,558,528
             ReLU-32                 [-1, 4096]               0
          Dropout-33                 [-1, 4096]               0
           Linear-34                 [-1, 4096]      16,781,312
             ReLU-35                 [-1, 4096]               0
          Dropout-36                 [-1, 4096]               0
           Linear-37                    [-1, 2]           8,194
================================================================
Total params: 59,574,018
Trainable params: 59,574,018
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 59.56
Params size (MB): 227.26
Estimated Total Size (MB): 287.01
----------------------------------------------------------------
  • 11
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值