ESANet的encoder代码和参数对比

博客探讨了CoaT编码器与ResNet50在语义分割任务中的参数量比较,以及ResNet34使用非瓶颈1D块(ERFNet)如何减少计算量。作者通过实例化ResNet34并调整其块类型,展示了如何在ResNet34中应用NonBottleneck1D-Block,并讨论了加载不同权重的方法,以适应不同输入通道的需求。
摘要由CSDN通过智能技术生成

代码

在看完CoaT的代码之后,主要关注了一下encoder的参数量。思考是否可以作为语义分割的backbone来使用。发现和resnet50的结构很像,都是[3,4,6,3]。

首先检测了resnet34,50的参数量,输入为(3,480,640),分别为:

from torchsummary import summary
from torchvision.models import resnet34

model = resnet34()
summary(model, input_size=[(3, 480, 640)],device="cpu")

from torchsummary import summary
from torchvision.models import resnet50

model = resnet50()
summary(model, input_size=[(3, 480, 640)],device="cpu")

结果相差不大。

接着是整个模型的参数,包括encoder和decoder且有两个分支。

然后测试一下之前看的CMX,首先CMX使用的segformer的encoder,然后是两个分支,这里另一个分支为Depth。使用的B2模型。

def main():

    model = mit_b2(RGBXTransformer)  # (传入参数)
    summary(model,input_size=[(3,480,640),(1,480,640)],device='cpu')

if __name__ == '__main__':
    main()

 然后就是ESANet的参数。先看ESANet的配置。

 然后是参数:

 

可以看到,如果用的原始的resnet34,就一个encoder就有21M。因此他这里使用的R34-Non-Bottle-1D-Block可以减少计算量。我们再看一下Non-Bottle-1D-Block的结构。这个结构是ERFNet提出来的。

 看一下模型的encoder组成:每一个resnet layer都是由Non-Bottle-1D-Block组成。

class NonBottleneck1D(nn.Module):
    """
    ERFNet-Block
    Paper:
    http://www.robesafe.es/personal/eduardo.romera/pdfs/Romera17tits.pdf
    Implementation from:
    https://github.com/Eromera/erfnet_pytorch/blob/master/train/erfnet.py
    """
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=None, dilation=1, norm_layer=None,
                 activation=nn.ReLU(inplace=True), residual_only=False):
        super().__init__()
        warnings.warn('parameters groups, base_width and norm_layer are '
                      'ignored in NonBottleneck1D')
        dropprob = 0
        self.conv3x1_1 = nn.Conv2d(inplanes, planes, (3, 1),
                                   stride=(stride, 1), padding=(1, 0),
                                   bias=True)
        self.conv1x3_1 = nn.Conv2d(planes, planes, (1, 3),
                                   stride=(1, stride), padding=(0, 1),
                                   bias=True)
        self.bn1 = nn.BatchNorm2d(planes, eps=1e-03)
        self.act = activation
        self.conv3x1_2 = nn.Conv2d(planes, planes, (3, 1),
                                   padding=(1 * dilation, 0), bias=True,
                                   dilation=(dilation, 1))
        self.conv1x3_2 = nn.Conv2d(planes, planes, (1, 3),
                                   padding=(0, 1 * dilation), bias=True,
                                   dilation=(1, dilation))
        self.bn2 = nn.BatchNorm2d(planes, eps=1e-03)
        self.dropout = nn.Dropout2d(dropprob)
        self.downsample = downsample
        self.stride = stride
        self.residual_only = residual_only

    def forward(self, input):
        output = self.conv3x1_1(input)
        output = self.act(output)
        output = self.conv1x3_1(output)
        output = self.bn1(output)
        output = self.act(output)

        output = self.conv3x1_2(output)
        output = self.act(output)
        output = self.conv1x3_2(output)
        output = self.bn2(output)

        if self.dropout.p != 0:
            output = self.dropout(output)

        if self.downsample is None:
            identity = input
        else:
            identity = self.downsample(input)

        if self.residual_only:
            return output
        # +input = identity (residual connection)
        return self.act(output + identity)

x首先经过3x1卷积,步长为(1,1),padding为(1,0),经过relu,再经过1x3卷积,步长为(1,1),padding为(0,1)。经过bn和relu。再经过3x1卷积,加relu,再经过1x3,加bn。在加上原始x,经过relu。

然后就是resnet的实例化等。

Non-Bottle-1D-Block和basicblock以及bottleneck block一样,可以作为模块的选择。

    model = ResNet34(block='NonBottleneck1D', pretrained_on_imagenet=True,
                     with_se=True, dilation=[1]*4)

我们实例化resnet34,然后在参数中指定block为NonBottleneck1D。

在resnet34中:我们获得输入通道为1或者3。实例化resnet指定为resnet34.然后调用load_pretrained_with_different_encoder_block函数。我们根据输入通道为3或者1,载入不同的权重。

def ResNet34(pretrained_on_imagenet=False,
             pretrained_dir='./trained_models/imagenet',
             **kwargs):
    if 'block' not in kwargs:
        kwargs['block'] = BasicBlock
    else:
        if kwargs['block'] in globals():
            # convert string to block object
            kwargs['block'] = globals()[kwargs['block']]
        else:
            raise NotImplementedError('Block {} is not implemented'
                                      ''.format(kwargs['block']))
    if 'input_channels' in kwargs and kwargs['input_channels'] == 1:
        input_channels = 1
    else:
        input_channels = 3
    model = ResNet([3, 4, 6, 3], **kwargs)
    if kwargs['block'] != BasicBlock and pretrained_on_imagenet:
        model = load_pretrained_with_different_encoder_block(
            model, kwargs['block'].__name__,
            input_channels, 'r34',
            pretrained_dir=pretrained_dir
        )
    elif pretrained_on_imagenet:
        weights = model_zoo.load_url(model_urls['resnet34'], model_dir='./')
        if input_channels == 1:
            # sum the weights of the first convolution
            weights['conv1.weight'] = torch.sum(weights['conv1.weight'],
                                                axis=1, keepdim=True)
        weights.pop('fc.weight')
        weights.pop('fc.bias')
        model.load_state_dict(weights, strict=True)
        print('Loaded ResNet34 pretrained on ImageNet')
    return model

因为在主框架中:我们使用了encoder_rgb和encoder_Depth。

        rgb = self.encoder_rgb.forward_layer1(rgb)
        depth = self.encoder_depth.forward_layer1(depth)

encoder_rgb调用resnet的resnet34,输入通道为3。我们载入新的权重;

    if torch.cuda.is_available():
        checkpoint = torch.load(ckpt_path)
    else:
        checkpoint = torch.load(ckpt_path, map_location=torch.device('cpu'))
    checkpoint['state_dict2'] = OrderedDict()

    # rename keys and leave out last fully connected layer
    # split():拆分字符串。通过指定分隔符对字符串进行切片,并返回分割后的字符串列表(list)
    for key in checkpoint['state_dict']:
        if 'encoder' in key:
            checkpoint['state_dict2'][key.split('encoder.')[-1]] = \
                checkpoint['state_dict'][key]
    weights = checkpoint['state_dict2']

encoder_depth调用resnet的resnet34,输入通道为1。我们将第一个卷积输入通道为3的维度相加为1.

    if input_channels == 1:
        # sum the weights of the first convolution
        weights['conv1.weight'] = torch.sum(weights['conv1.weight'],
                                            axis=1,
                                            keepdim=True)

    model.load_state_dict(weights, strict=False)

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
以下是一个示例的Transformer Encoder的代码: ```python import torch import torch.nn as nn class TransformerEncoder(nn.Module): def __init__(self, input_dim, hidden_dim, num_layers, num_heads): super(TransformerEncoder, self).__init__() self.embedding = nn.Embedding(input_dim, hidden_dim) self.positional_encoding = PositionalEncoding(hidden_dim) self.encoder_layers = nn.ModuleList([ TransformerEncoderLayer(hidden_dim, num_heads) for _ in range(num_layers) ]) def forward(self, input): embedded_input = self.embedding(input) encoded_input = self.positional_encoding(embedded_input) for encoder_layer in self.encoder_layers: encoded_input = encoder_layer(encoded_input) return encoded_input class PositionalEncoding(nn.Module): def __init__(self, hidden_dim, max_length=1000): super(PositionalEncoding, self).__init__() self.hidden_dim = hidden_dim self.max_length = max_length self.positional_encoding = self.generate_positional_encoding() def forward(self, input): batch_size, seq_length, _ = input.size() positional_encoding = self.positional_encoding[:seq_length, :].unsqueeze(0).expand(batch_size, -1, -1) return input + positional_encoding def generate_positional_encoding(self): positional_encoding = torch.zeros(self.max_length, self.hidden_dim) position = torch.arange(0, self.max_length, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, self.hidden_dim, 2).float() * (-math.log(10000.0) / self.hidden_dim)) positional_encoding[:, 0::2] = torch.sin(position * div_term) positional_encoding[:, 1::2] = torch.cos(position * div_term) return positional_encoding class TransformerEncoderLayer(nn.Module): def __init__(self, hidden_dim, num_heads): super(TransformerEncoderLayer, self).__init__() self.multihead_attention = MultiheadAttention(hidden_dim, num_heads) self.feed_forward = FeedForward(hidden_dim) self.layer_norm1 = nn.LayerNorm(hidden_dim) self.layer_norm2 = nn.LayerNorm(hidden_dim) def forward(self, input): attention_output = self.multihead_attention(input) attention_output = self.layer_norm1(input + attention_output) feed_forward_output = self.feed_forward(attention_output) output = self.layer_norm2(attention_output + feed_forward_output) return output class MultiheadAttention(nn.Module): def __init__(self, hidden_dim, num_heads): super(MultiheadAttention, self).__init__() self.hidden_dim = hidden_dim self.num_heads = num_heads self.head_dim = hidden_dim // num_heads self.query_projection = nn.Linear(hidden_dim, hidden_dim) self.key_projection = nn.Linear(hidden_dim, hidden_dim) self.value_projection = nn.Linear(hidden_dim, hidden_dim) self.output_projection = nn.Linear(hidden_dim, hidden_dim) def forward(self, input): batch_size, seq_length, _ = input.size() query = self.query_projection(input) key = self.key_projection(input) value = self.value_projection(input) query = self.split_heads(query) key = self.split_heads(key) value = self.split_heads(value) scaled_attention_scores = torch.matmul(query, key.transpose(-1, -2)) / math.sqrt(self.head_dim) attention_weights = nn.functional.softmax(scaled_attention_scores, dim=-1) attention_output = torch.matmul(attention_weights, value) attention_output = self.combine_heads(attention_output) output = self.output_projection(attention_output) return output def split_heads(self, input): batch_size, seq_length, hidden_dim = input.size() input = input.view(batch_size, seq_length, self.num_heads, self.head_dim) return input.transpose(1, 2) def combine_heads(self, input): batch_size, _, seq_length, hidden_dim = input.size() input = input.transpose(1, 2).contiguous() return input.view(batch_size, seq_length, self.num_heads * self.head_dim) class FeedForward(nn.Module): def __init__(self, hidden_dim): super(FeedForward, self).__init__() self.hidden_dim = hidden_dim self.feed_forward = nn.Sequential( nn.Linear(hidden_dim, 4 * hidden_dim), nn.ReLU(), nn.Linear(4 * hidden_dim, hidden_dim) ) def forward(self, input): return self.feed_forward(input) ```
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值