【笔记】torchsummary.summary()，count_parameters(model)：forward中的layer不加区分(不考虑参数共享的情况)统计参数；统计init构建对象时的参数

最新推荐文章于 2024-06-05 19:59:44 发布

程序猿的探索之路

最新推荐文章于 2024-06-05 19:59:44 发布

阅读量1.7k

点赞数 3

分类专栏：小菜鸡加油

原文链接：https://zhuanlan.zhihu.com/p/64425750

版权

小菜鸡加油专栏收录该内容

399 篇文章 32 订阅

订阅专栏

本文探讨了在PyTorch中使用torchsummary统计网络参数时，遇到的参数共享问题。通过三个案例分析了无参数共享、参数共享以及初始化但未使用的层的情况，指出在存在参数共享时，torchsummary的统计可能不准确，而count_parameters方法能提供正确的参数计数。总结中强调，在处理参数共享时，应避免未使用的层对象，以确保参数统计的准确性。

摘要由CSDN通过智能技术生成

我们都知道，卷积神经网络的参数统计是很重要的，关于一个网络的容量大小与性能评价。pytorch的参数统计与层结构的打印可以用torchsummary 来统计，但是前几天在写网络的时候遇到了共享参数问题，再用torchsummary的时候就出现了问题，经过进一步实验，终于找到了正确统计参数的规律。

先以实例进行讲解：

case1: 无参数共享（最常见）

# -*- coding: utf-8 -*-
import torch
import torch.nn as nn
import torchsummary
from torch.nn import init
class BaseNet(nn.Module):
    def __init__(self):
        super(BaseNet,self).__init__()
        self.conv1=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)
        self.conv2=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight.data)
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                init.normal_(m.weight.data, 1.0, 0.02)
                init.constant_(m.bias.data, 0.0)
    def forward(self,x):
        x=self.conv1(x)
        out_map=self.conv2(x)
        return out_map
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)




model = BaseNet()
torchsummary.summary(model, (1, 512, 512))
print('parameters_count:',count_parameters(model))

输出：

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1          [-1, 1, 512, 512]               9
            Conv2d-2          [-1, 1, 512, 512]               9
================================================================
Total params: 18
Trainable params: 18
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.00
Forward/backward pass size (MB): 4.00
Params size (MB): 0.00
Estimated Total Size (MB): 5.00
----------------------------------------------------------------
parameters_count: 18

首先说一下 count_parameters(model) 函数的意思：

return sum(p.numel() for p in model.parameters() if p.requires_grad)

其中model.parameters()是取得模型的参数，if p.requires_grad 是可求导参数的情况下。其实在定义网络的时候基本上都是可求导参数，包括卷积层参数，BN层参数，所以我们统计可求导参数。然后numel()是统计numpy数组里面的元素的个数。这样一来就很明显了，我们定义了两个3x3卷积层，而且没有bias，所以参数个数是3x3x2=18个，网络结构是两层。到这里都对！然后我们再看看case2.

case2: 参数共享

# -*- coding: utf-8 -*-
import torch
import torch.nn as nn
import torchsummary
from torch.nn import init
class BaseNet(nn.Module):
    def __init__(self):
        super(BaseNet,self).__init__()
        self.conv1=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)
        # self.conv2=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight.data)
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                init.normal_(m.weight.data, 1.0, 0.02)
                init.constant_(m.bias.data, 0.0)
    def forward(self,x):
        x=self.conv1(x)
        out_map=self.conv1(x)#注意这里换成了conv1
        return out_map
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)




model = BaseNet()
torchsummary.summary(model, (1, 512, 512))
print('parameters_count:',count_parameters(model))

输出：

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1          [-1, 1, 512, 512]               9
            Conv2d-2          [-1, 1, 512, 512]               9
================================================================
Total params: 18
Trainable params: 18
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.00
Forward/backward pass size (MB): 4.00
Params size (MB): 0.00
Estimated Total Size (MB): 5.00
----------------------------------------------------------------
parameters_count: 9

看看出现了什么？，parameter_count统计的是9个参数，而torchsummary统计的是18个参数，为什么会出现这种问题？要想找到原因肯定是要先了解我们的网络是怎么构建的，从网络构建可以看出，我们只初始化了一个卷积层对象——conv1，然后在网络构建时（forward里面）,重复调用了conv1,这样做是因为：根据pytorch官方的教程，这样可以实现参数共享，也就是Conv2d-1 和Conv2d-2 层共享了conv1的参数。也就是其实我们只用了一个卷积层的参数，所以parameters_count 计算的是对的，但是torchsummary为什么计算成了18？那是因为torchsummary 计算时是先把层结构打印下来，然后再统计对各个层的参数求和，这样一来，它不会区分conv2d-1和conv2d-2里面的参数是否相同，只是根据结构都打印且统计了出来。所以在遇到参数共享的时候，torchsummary统计的是不正确的！但是parameter_count 统计的就一定正确吗？并不是！请看下面的例子。

case3: 初始化了层，却没有调用

# -*- coding: utf-8 -*-
import torch
import torch.nn as nn
import torchsummary
from torch.nn import init
class BaseNet(nn.Module):
    def __init__(self):
        super(BaseNet,self).__init__()
        self.conv1=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)
        self.conv2=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight.data)
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                init.normal_(m.weight.data, 1.0, 0.02)
                init.constant_(m.bias.data, 0.0)
    def forward(self,x):
        x=self.conv1(x)
        out_map=self.conv1(x)
        return out_map
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)




model = BaseNet()
torchsummary.summary(model, (1, 512, 512))
print('parameters_count:',count_parameters(model))

输出：

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1          [-1, 1, 512, 512]               9
            Conv2d-2          [-1, 1, 512, 512]               9
================================================================
Total params: 18
Trainable params: 18
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.00
Forward/backward pass size (MB): 4.00
Params size (MB): 0.00
Estimated Total Size (MB): 5.00
----------------------------------------------------------------
parameters_count: 18

可以看到，我们构建网络的时候和case2是相同的，都是共享conv1的参数，但是与case2不同的是我们也初始化了一个conv2的卷积层对象，但是没有用来构建网络，就放在了那里，这个时候parameters_count也出现了错误，错误的计算成了18，那是因为你在BaseNet类里多初始化了conv2,即使你没有在forward里面调用，但是它也算在你的model.parameters()里面，所以总参数变成了18，这个时候torchsummary的18 和parameters_count的18是不同的意思，但是都是错的，对于你本来想构建的网络来说。如果不信的话可以看看下面这个例子，case3,case4做一下对比就知道了！

case4:

# -*- coding: utf-8 -*-
import torch
import torch.nn as nn
import torchsummary
from torch.nn import init
class BaseNet(nn.Module):
    def __init__(self):
        super(BaseNet,self).__init__()
        self.conv1=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)
        self.conv2=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)
        self.conv3=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3,stride=1,padding=1,bias=False)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight.data)
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                init.normal_(m.weight.data, 1.0, 0.02)
                init.constant_(m.bias.data, 0.0)
    def forward(self,x):
        x=self.conv1(x)
        out_map=self.conv1(x)
        return out_map
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)




model = BaseNet()
torchsummary.summary(model, (1, 512, 512))
print('parameters_count:',count_parameters(model))

输出：

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1          [-1, 1, 512, 512]               9
            Conv2d-2          [-1, 1, 512, 512]               9
================================================================
Total params: 18
Trainable params: 18
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.00
Forward/backward pass size (MB): 4.00
Params size (MB): 0.00
Estimated Total Size (MB): 5.00
----------------------------------------------------------------
parameters_count: 27