ESANet的encoder代码和参数对比

最新推荐文章于 2023-02-09 11:48:30 发布

翰墨大人

最新推荐文章于 2023-02-09 11:48:30 发布

阅读量1k

点赞数 1

分类专栏： paper总结文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/qq_43733107/article/details/127388217

版权

paper总结专栏收录该内容

41 篇文章 13 订阅

订阅专栏

博客探讨了CoaT编码器与ResNet50在语义分割任务中的参数量比较，以及ResNet34使用非瓶颈1D块（ERFNet）如何减少计算量。作者通过实例化ResNet34并调整其块类型，展示了如何在ResNet34中应用NonBottleneck1D-Block，并讨论了加载不同权重的方法，以适应不同输入通道的需求。

摘要由CSDN通过智能技术生成

代码

在看完CoaT的代码之后，主要关注了一下encoder的参数量。思考是否可以作为语义分割的backbone来使用。发现和resnet50的结构很像，都是[3,4,6,3]。

首先检测了resnet34,50的参数量，输入为(3,480,640)，分别为：

from torchsummary import summary
from torchvision.models import resnet34

model = resnet34()
summary(model, input_size=[(3, 480, 640)],device="cpu")

from torchsummary import summary
from torchvision.models import resnet50

model = resnet50()
summary(model, input_size=[(3, 480, 640)],device="cpu")

结果相差不大。

接着是整个模型的参数，包括encoder和decoder且有两个分支。

然后测试一下之前看的CMX，首先CMX使用的segformer的encoder，然后是两个分支，这里另一个分支为Depth。使用的B2模型。

def main():

    model = mit_b2(RGBXTransformer)  # (传入参数)
    summary(model,input_size=[(3,480,640),(1,480,640)],device='cpu')

if __name__ == '__main__':
    main()

然后就是ESANet的参数。先看ESANet的配置。

然后是参数：

可以看到，如果用的原始的resnet34，就一个encoder就有21M。因此他这里使用的R34-Non-Bottle-1D-Block可以减少计算量。我们再看一下Non-Bottle-1D-Block的结构。这个结构是ERFNet提出来的。

看一下模型的encoder组成：每一个resnet layer都是由Non-Bottle-1D-Block组成。

class NonBottleneck1D(nn.Module):
    """
    ERFNet-Block
    Paper:
    http://www.robesafe.es/personal/eduardo.romera/pdfs/Romera17tits.pdf
    Implementation from:
    https://github.com/Eromera/erfnet_pytorch/blob/master/train/erfnet.py
    """
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=None, dilation=1, norm_layer=None,
                 activation=nn.ReLU(inplace=True), residual_only=False):
        super().__init__()
        warnings.warn('parameters groups, base_width and norm_layer are '
                      'ignored in NonBottleneck1D')
        dropprob = 0
        self.conv3x1_1 = nn.Conv2d(inplanes, planes, (3, 1),
                                   stride=(stride, 1), padding=(1, 0),
                                   bias=True)
        self.conv1x3_1 = nn.Conv2d(planes, planes, (1, 3),
                                   stride=(1, stride), padding=(0, 1),
                                   bias=True)
        self.bn1 = nn.BatchNorm2d(planes, eps=1e-03)
        self.act = activation
        self.conv3x1_2 = nn.Conv2d(planes, planes, (3, 1),
                                   padding=(1 * dilation, 0), bias=True,
                                   dilation=(dilation, 1))
        self.conv1x3_2 = nn.Conv2d(planes, planes, (1, 3),
                                   padding=(0, 1 * dilation), bias=True,
                                   dilation=(1, dilation))
        self.bn2 = nn.BatchNorm2d(planes, eps=1e-03)
        self.dropout = nn.Dropout2d(dropprob)
        self.downsample = downsample
        self.stride = stride
        self.residual_only = residual_only

    def forward(self, input):
        output = self.conv3x1_1(input)
        output = self.act(output)
        output = self.conv1x3_1(output)
        output = self.bn1(output)
        output = self.act(output)

        output = self.conv3x1_2(output)
        output = self.act(output)
        output = self.conv1x3_2(output)
        output = self.bn2(output)

        if self.dropout.p != 0:
            output = self.dropout(output)

        if self.downsample is None:
            identity = input
        else:
            identity = self.downsample(input)

        if self.residual_only:
            return output
        # +input = identity (residual connection)
        return self.act(output + identity)

x首先经过3x1卷积，步长为(1,1),padding为(1,0),经过relu，再经过1x3卷积，步长为(1,1)，padding为(0,1)。经过bn和relu。再经过3x1卷积，加relu，再经过1x3，加bn。在加上原始x，经过relu。

然后就是resnet的实例化等。

Non-Bottle-1D-Block和basicblock以及bottleneck block一样，可以作为模块的选择。

    model = ResNet34(block='NonBottleneck1D', pretrained_on_imagenet=True,
                     with_se=True, dilation=[1]*4)

我们实例化resnet34，然后在参数中指定block为NonBottleneck1D。

在resnet34中：我们获得输入通道为1或者3。实例化resnet指定为resnet34.然后调用load_pretrained_with_different_encoder_block函数。我们根据输入通道为3或者1，载入不同的权重。

def ResNet34(pretrained_on_imagenet=False,
             pretrained_dir='./trained_models/imagenet',
             **kwargs):
    if 'block' not in kwargs:
        kwargs['block'] = BasicBlock
    else:
        if kwargs['block'] in globals():
            # convert string to block object
            kwargs['block'] = globals()[kwargs['block']]
        else:
            raise NotImplementedError('Block {} is not implemented'
                                      ''.format(kwargs['block']))
    if 'input_channels' in kwargs and kwargs['input_channels'] == 1:
        input_channels = 1
    else:
        input_channels = 3
    model = ResNet([3, 4, 6, 3], **kwargs)
    if kwargs['block'] != BasicBlock and pretrained_on_imagenet:
        model = load_pretrained_with_different_encoder_block(
            model, kwargs['block'].__name__,
            input_channels, 'r34',
            pretrained_dir=pretrained_dir
        )
    elif pretrained_on_imagenet:
        weights = model_zoo.load_url(model_urls['resnet34'], model_dir='./')
        if input_channels == 1:
            # sum the weights of the first convolution
            weights['conv1.weight'] = torch.sum(weights['conv1.weight'],
                                                axis=1, keepdim=True)
        weights.pop('fc.weight')
        weights.pop('fc.bias')
        model.load_state_dict(weights, strict=True)
        print('Loaded ResNet34 pretrained on ImageNet')
    return model

因为在主框架中：我们使用了encoder_rgb和encoder_Depth。

        rgb = self.encoder_rgb.forward_layer1(rgb)
        depth = self.encoder_depth.forward_layer1(depth)

encoder_rgb调用resnet的resnet34，输入通道为3。我们载入新的权重;

    if torch.cuda.is_available():
        checkpoint = torch.load(ckpt_path)
    else:
        checkpoint = torch.load(ckpt_path, map_location=torch.device('cpu'))
    checkpoint['state_dict2'] = OrderedDict()

    # rename keys and leave out last fully connected layer
    # split()：拆分字符串。通过指定分隔符对字符串进行切片，并返回分割后的字符串列表（list）
    for key in checkpoint['state_dict']:
        if 'encoder' in key:
            checkpoint['state_dict2'][key.split('encoder.')[-1]] = \
                checkpoint['state_dict'][key]
    weights = checkpoint['state_dict2']

encoder_depth调用resnet的resnet34，输入通道为1。我们将第一个卷积输入通道为3的维度相加为1.

    if input_channels == 1:
        # sum the weights of the first convolution
        weights['conv1.weight'] = torch.sum(weights['conv1.weight'],
                                            axis=1,
                                            keepdim=True)

    model.load_state_dict(weights, strict=False)