ShuffleNet & EfficientNet

最新推荐文章于 2022-09-20 23:32:04 发布

Shelter.

最新推荐文章于 2022-09-20 23:32:04 发布

阅读量294

点赞数

文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/qq_43920640/article/details/126232792

版权

ShuffleNet

ShuffleNet v1

GConv虽然能够减少参数和计算量，但GConv中不同组之间信息没有交流。对于这个问题提出了Channel Shuffle的概念。

在ResNeXt网络中，1*1的普通卷积占据了计算量的93.4%，所以在ShuffleNet v1中作者将1*1的卷积换成了分组卷积GConv，其中（b）图对应stride = 1 的情况，（b）图对应stride = 2 的情况。

ShuffleNet v1的网络结构

理论计算力 浮点运算数FLOPs对比

可见ShuffleNet的Block所需FLOPs相对于ResNet和ResNeXt是最小的。

ShuffleNet v2

作者提出计算复杂度不能只看FLOPs，MAC（内存访问时间成本）、并行等级和平台都有影响。

对此作者对如何设计高效的网络提出了四条建议。

当卷积层的输入特征矩阵与输出特征矩阵channel相等时MAC最小（保持FLOPs不变时，针对1*1的卷积）

保持FLOPs不变的情况下，改变c1/c2的比值。可见当比值相差越大时，推理速度越慢。对于不同的平台架构效果也不同。

当GConv的groups增大时（保持FLOPs不变），MAC也会增大。

网络设计的碎片化（分支）程度越高，速度越慢。
Element-wise操作的影响。

总结：使用平衡的卷积，让输入特征矩阵和输出特征矩阵的channel比值尽可能为1；注意分组卷积的计算成本；降低网络的碎片程度；减少使用 Element-wise操作。

针对上述建议对ShuffleNet v1的Block进行改进等得到ShuffleNet v2的Block。

图（d）为下采样（stride = 2）的情况。

ShuffleNet v2的网络结构

Pytorch搭建ShuffleNet V2

Block模块。对stride = 1和2的两种情况进行判断后生成相应的左分支；DW卷积静态化。

class InvertedResidual(nn.Module):
    def __init__(self, input_c: int, output_c: int, stride: int):
        super(InvertedResidual, self).__init__()

        if stride not in [1, 2]:
            raise ValueError("illegal stride value.")
        self.stride = stride

        assert output_c % 2 == 0
        branch_features = output_c // 2
        # 当stride为1时，input_channel应该是branch_features的两倍
        # python中 '<<' 是位运算，可理解为计算×2的快速方法
        assert (self.stride != 1) or (input_c == branch_features << 1)

        if self.stride == 2:
            self.branch1 = nn.Sequential(
                self.depthwise_conv(input_c, input_c, kernel_s=3, stride=self.stride, padding=1),
                nn.BatchNorm2d(input_c),
                nn.Conv2d(input_c, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
                nn.BatchNorm2d(branch_features),
                nn.ReLU(inplace=True)
            )
        else:
            self.branch1 = nn.Sequential()

        self.branch2 = nn.Sequential(
            nn.Conv2d(input_c if self.stride > 1 else branch_features, branch_features, kernel_size=1,
                      stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True),
            self.depthwise_conv(branch_features, branch_features, kernel_s=3, stride=self.stride, padding=1),
            nn.BatchNorm2d(branch_features),
            nn.Conv2d(branch_features, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True)
        )

    @staticmethod
    def depthwise_conv(input_c: int,
                       output_c: int,
                       kernel_s: int,
                       stride: int = 1,
                       padding: int = 0,
                       bias: bool = False) -> nn.Conv2d:
        return nn.Conv2d(in_channels=input_c, out_channels=output_c, kernel_size=kernel_s,
                         stride=stride, padding=padding, bias=bias, groups=input_c)

    def forward(self, x: Tensor) -> Tensor:
        if self.stride == 1:
            x1, x2 = x.chunk(2, dim=1)
            out = torch.cat((x1, self.branch2(x2)), dim=1)
        else:
            out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)

        out = channel_shuffle(out, 2)

        return out

ShuffleNet v2的网络结构

class ShuffleNetV2(nn.Module):
    def __init__(self,
                 stages_repeats: List[int],
                 stages_out_channels: List[int],
                 num_classes: int = 1000,
                 inverted_residual: Callable[..., nn.Module] = InvertedResidual):
        super(ShuffleNetV2, self).__init__()

        if len(stages_repeats) != 3:
            raise ValueError("expected stages_repeats as list of 3 positive ints")
        if len(stages_out_channels) != 5:
            raise ValueError("expected stages_out_channels as list of 5 positive ints")
        self._stage_out_channels = stages_out_channels

        # input RGB image
        input_channels = 3
        output_channels = self._stage_out_channels[0]

        self.conv1 = nn.Sequential(
            nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(output_channels),
            nn.ReLU(inplace=True)
        )
        input_channels = output_channels

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # Static annotations for mypy
        self.stage2: nn.Sequential
        self.stage3: nn.Sequential
        self.stage4: nn.Sequential

        stage_names = ["stage{}".format(i) for i in [2, 3, 4]]
        for name, repeats, output_channels in zip(stage_names, stages_repeats,
                                                  self._stage_out_channels[1:]):
            seq = [inverted_residual(input_channels, output_channels, 2)]
            for i in range(repeats - 1):
                seq.append(inverted_residual(output_channels, output_channels, 1))
            setattr(self, name, nn.Sequential(*seq))
            input_channels = output_channels

        output_channels = self._stage_out_channels[-1]
        self.conv5 = nn.Sequential(
            nn.Conv2d(input_channels, output_channels, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(output_channels),
            nn.ReLU(inplace=True)
        )

        self.fc = nn.Linear(output_channels, num_classes)

    def _forward_impl(self, x: Tensor) -> Tensor:
        # See note [TorchScript super()]
        x = self.conv1(x)
        x = self.maxpool(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.conv5(x)
        x = x.mean([2, 3])  # global pool
        x = self.fc(x)
        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)

对于不同版本 0.5x、1x、1.5x和2x均有相应的定义以及官方预权重。

采用1x版本，使用官方对应的预训练权重，只对最后一个全连接层进行训练。在29个epoch后，损失已经降到了0.863，准确率也达到了0.856.