|从零搭建网络| ShuffleNet系列网络详解及搭建

愿望会实现吧

已于 2024-06-06 12:59:35 修改

阅读量1.2k

点赞数 31

分类专栏：深度学习文章标签：网络深度学习 pycharm

于 2024-04-18 14:56:55 首次发布

本文链接：https://blog.csdn.net/qq_58202678/article/details/137876524

版权

深度学习专栏收录该内容

9 篇文章 0 订阅

订阅专栏

🌜|从零搭建网络| ShuffleNet系列网络详解及搭建🌛

文章目录

🌜|从零搭建网络| ShuffleNet系列网络详解及搭建🌛

🌜 前言 🌛

继MobileNet系列和Mnasnet的轻量级网络学习之后，下面一个比较重要的轻量级网络就是ShuffleNet系列网络的学习了，ShuffleNet系列网络也是一个很重要的轻量级网络，他所提出的创新点对后续轻量级网络的搭建很有帮助，并且在ShuffleNetV1的原论文中和AlexNet网络作对比不仅在速度上大大超越了他，准确率也并没有下降，本篇博客将从原理详解和pytorch复现两个角度对ShuffleNet系列网络做出详解。

🌜 ShuffleNetV1详解 🌛

首先附上原论文连接：https://arxiv.org/pdf/1707.01083v2.pdf
在这里插入图片描述
ShuffleNet网络作为MobileNet系列网络后的又一轻量级网络，秉承了一贯的‘低成本’、‘高效率’的风格，可以在计算资源受限的情况下实现较为高效的图像分类任务，是又一大直接训练出的轻量化网络。
在这里插入图片描述在原论文的摘要部分提到，ShuffleNet网络在目标检测的任务上性能比MobileNet更高，并且与AlexNet相比，各项指标都远高于AlexNet。
摘要中另一个关键信息就是提出了本篇论文的两个创新点，分别是group convolution(分组卷积)和channel shuffle(通道重排)，下面主要围绕这两处创新点展开介绍。

🌜 ShuffleNetV1创新点 🌛

本篇论文涉及到的两个创新点分别是group convolution(分组卷积)和channel shuffle(通道重排)，其中分组卷积可以极大地减少网络中的计算量和参数数量，从而降低模型的复杂度；通道重排则是在不引入额外参数的情况下增加跨通道间的信息交流，从而增加模型的表征能力。

🌜 Group Convolution(分组卷积) 🌛

原论文中提到的分组卷积可以看做是MobileNet提出的深度可分离卷积的另一个版本，ShuffleNet中指出，1x1卷积的涉及的计算量很大，为了减少这种计算量，就把大量的1x1卷积替换成了分组卷积，首先我们先来对比一下普通卷积、深度可分离卷积以及分组卷积大概的形式。普通卷积和深度可分离卷积这里不过多赘述，想要深度了解可以看一下之前的博客。|从零搭建网络| MobileNet系列网络详解及搭建的‘深度可分离卷积’部分
首先是普通的卷积，他的方式是输入信号的每一个通道都对应卷积核的一个单独的通道，并且卷积核的数量对应着输出通道的数量，他所需要的计算量很大。
在这里插入图片描述
其次是深度可分离卷积，他的每个卷积核的深度都是唯一的并且为1，每个卷积核只负责对输入数据的其中一个通道作信息的交互，并且卷积前后信号的输入输出通道数量不变，这种卷积方式通常和1x1的卷积相配合极大地缩减卷积的计算量。
在这里插入图片描述
最后就是本篇论文的重点分组卷积，它的做法是使用部分卷积核去处理输入数据的部分通道，（可能说的有点不像人话），下面我们用一组图来详细了解一下详细处理的流程。

就像上述图中表述的内容，假设输入数据的通道数为6，把输入数据分成3组，分别用3个2x2的卷积核去处理输入数据的部分通道，即把输入数据的6个通道分别分成三组，每组有两个通道，这样一来每个卷积核只需要处理输入数据的两个通道，在这个过程中输出数据的通道数依然等于卷积核的个数。和深度可分离卷积对比一下的话，可以把深度可分离卷积近似理解为每个通道分为1组的分组卷积。
虽然这种方式相较于普通的卷积可以大幅度地降低卷积的计算量，相较于深度可分离卷积减少了1x1卷积过于增加FLOPs的特性，但是这种做法有一个坏处，就像原文中所叙述的那样。
在这里插入图片描述
这种分组卷积的方式只能处理相同通道之间的信息交互，无法处理通道与通道之间的信息交互，这种方式就有点类似于近亲结婚的感觉，只能发挥自己的血缘的独特性，并且还会大大增加遗传疾病的风险。
这时候就需要第二个创新点channel shuffle(通道重排)。

🌜 Channel Shuffle(通道重排) 🌛

首先来看一下原文中对于通道重排叙述的一个图。
在这里插入图片描述
图中GConv指的是分组卷积，图中a部分指的是简单的进行卷积层堆叠的网络效果，b图指的是当使用两个组卷积时，通道间的信息状态，c图则指的是使用分组卷积和通道重排时，能够充分的利用各个通道间的信息，使其进行信息交互。
而通道重排是怎样实现的呢，我们可以从一幅图搞懂其原理。
在这里插入图片描述
从上图中我们可以看到，假如说我们需要把输入数据分成3组进行通道重排，首先将原数据reshape成g行n列的矩阵 （此时g为我们需要分成的组数，n为每组包含的通道数） ，这时候1234为一组、5678为一组、9 10 11 12 为一组，进行转置后再次将矩阵进行展平操作，展平后的向量就有了各组的信息交互。
当我们同时使用分组卷积和通道重排时，一方面可以极大充分利用通道间的信息交互进行特征提取，另一方面相较于其他卷积方式可以大大地降低计算量。
在这里插入图片描述
就像上图中所描绘的一样，当我们对12通道的输入数据使用6个3x3的卷积核处理成6通道的输出数据后，参数量达到了648；而当我们将其分为三组进行分组卷积的时候哦，参数量只有216，大大减少了运算的成本。

🌜 ShuffleNetV1网络搭建及Pytorch复现 🌛

下面我将一步步对ShuffleNetV1网络的结构进行分析并利用pytorch框架进行源代码的复现工作，下列代码中首先展示的是一维网络的复现，一维网络最后会有二维网络的复现。

🌜 一维Channel Shuffle实现 🌛

首先进行通道重排的搭建，通道重排的复现步骤根据上述‘rehsape、转置、展平’的步骤进行，首先来看一下对应代码：

import torch
import torch.nn.functional as F

class channel_shuffle(torch.nn.Module):
    def __init__(self,groups):
        super().__init__()
        self.groups = groups

    def forward(self,x):
        b,c,l = x.size()
        group_channel = c // self.groups
        x = x.reshape(b,self.groups,group_channel,l)
        x = x.permute(0,2,1,3).contiguous()
        x = x.reshape(b,c,l)
        return x

在初始化中，我们仅仅对于输入的groups进行初始化，指的是我们通道重排需要分成的组数。
在forward的方法中，我们首先读取到输入数据的三个重要属性，b(batch)，c(channel)、l(length)，读取到后，我们把输入数据进行重排为二维向量，然后使用permute函数将重排后数据的1、2位置调换一下，并且使用contiguous将重排后的向量处理为连续向量，最后利用reshape函数将处理后的向量进行重塑。

permute 函数是 PyTorch 中的一个张量操作函数，用于对张量的维度进行重新排列。具体来说，permute 函数可以按照指定的顺序重新排列张量的维度，从而改变张量的形状。

下面用一个实例程序来说明permute函数的用法。

import torch

x  = torch.randn(size = (1,2,3,4))
print(x.shape)
#torch.size([1,2,3,4])
x = x.permute(0,2,1,3)
print(x.shape)
#torch.size([1,3,2,4])

上述代码中首先随机生成了一个形状为（1，2，3，4）的二维数组，然后使用permute(0,2,1,3)即把原数组中1的位置和2的位置调换顺序，输出数组的形状就变成了（1，3，2，4）。

🌜 一维Group Convolution实现 🌛

'下面我们继续来来复线一维分组卷积，首先看一下原文中给的分组卷积的结构。
在这里插入图片描述
分组卷积根据步长的不同一共由两种形式，当步长stride为1的时候，他就是图中b的形式，首先会经过一个1x1的组卷积，在进行了BN和ReLu处理后进行通道重排操作，然后经过一个3x3的DW卷积，通过BN处理后经过1x1的组卷积最后在进行一个BN处理，完成后与输入信号相加，做一个short cut 连接。
当步长为2时，首先右边的处理与步长为1的处理类似，无非就是把中间DW卷积的步长改为2，左边则是进行了一个3x3卷积核步长为2的全局池化吃力，最后也不是经过short cut连接，而是将两个向量拼接在一起。
需要注意的地方有两点，第一点是右边分支处理的时候中间层的输出通道数在后文并没有给出，但是原文中提到：
在这里插入图片描述也就是中间层输出通道数为输入通道数//4;另外一点是当做步长为2的处理时的concat连接，这个连接会将两个分支的通道数加在一起，也就是说在搭建分支网络结构的时候，分支网络的输出通道数是每个block输出通道数的一半。
下面正式进行代码复现部分：

class shuffleblock(torch.nn.Module):
    def __init__(self,input_channel,mid_channel,output_channel,stride = 2,groups = 3):
        super().__init__()
        self.stride = stride
        self.conv = torch.nn.Sequential(
            torch.nn.Conv1d(input_channel,mid_channel,1,1,0,groups = groups),
            torch.nn.BatchNorm1d(mid_channel),
            torch.nn.ReLU(),
            channel_shuffle(groups=groups),
            torch.nn.Conv1d(mid_channel,mid_channel,3,self.stride,1,groups=mid_channel),
            torch.nn.BatchNorm1d(mid_channel),
            torch.nn.Conv1d(mid_channel,output_channel,1,1,0,groups=groups),
            torch.nn.BatchNorm1d(output_channel)
        )
        self.down = torch.nn.AvgPool1d(3,2,1)

    def forward(self,x):
        old_x = x
        out = self.conv(x)
        if self.stride == 1:
            out += old_x
            return F.relu(out)
        else:
            out = torch.cat((self.down(x),self.conv(x)),1)
            return F.relu(out)

首先在初始化部分对输入、中间以及输出通道数进行初始化，步长先默认为2、分组以3组为例，然后分别对两个分支进行初始化；forward部分则是分为两个，分别是步长为1的时候和步长为2的时候，步长为1则是short cut连接后relu处理，步长为2的时候则是cat组合后relu处理。

🌜 一维ShuffleNetV1实现 🌛

首先来看一下一维ShuffleNetV1网络架构：
在这里插入图片描述
这里以分三组也就是g=3为例进行实现，这里直接根据我们之前定义好的block一步步搭建即可。代码如下所示：

class shufflenetv1(torch.nn.Module):
    def __init__(self,in_channels,classes):
        super().__init__()
        self.classes = classes
        self.features = torch.nn.Sequential(
            torch.nn.Conv1d(in_channels, 24, 3, 2, 1),
            torch.nn.Conv1d(24, 120, 3, 2, 1),
            torch.nn.MaxPool1d(3, 2, 1),
            shuffleblock(120, 120 // 4, 120, 2, 3),
            shuffleblock(240, 240 // 4,240,1),
            shuffleblock(240, 240 // 4, 240, 1),
            shuffleblock(240, 240 // 4, 240, 1),

            shuffleblock(240,240 // 4,240),
            shuffleblock(480,480 // 4,480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),

            shuffleblock(480,480 // 4,480),
            shuffleblock(960,960 // 4,960,1),
            shuffleblock(960, 960 // 4, 960,1),
            shuffleblock(960, 960 // 4, 960,1),

            torch.nn.AdaptiveAvgPool1d(1)

        )
        self.classifier = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(960,self.classes)
        )

    def forward(self,x):
        x = self.features(x)
        x = self.classifier(x)
        return x

只需要注意步长为2时，也就是使用cat组合时，分支输出的通道数为实际block输出通道数的一半即可（例如第一个block也就是stage2的时候，实际输出通道数为240，而代码中定义的输出通道数仅为120，那是因为在后续经过block的连接后通道数会自动变为240）
整体代码复现：

import torch
import torch.nn.functional as F

class channel_shuffle(torch.nn.Module):
    def __init__(self,groups):
        super().__init__()
        self.groups = groups

    def forward(self,x):
        b,c,l = x.size()
        group_channel = c // self.groups
        x = x.reshape(b,self.groups,group_channel,l)
        x = x.permute(0,2,1,3).contiguous()
        x = x.reshape(b,c,l)
        return x

class shuffleblock(torch.nn.Module):
    def __init__(self,input_channel,mid_channel,output_channel,stride = 2,groups = 3):
        super().__init__()
        self.stride = stride
        self.conv = torch.nn.Sequential(
            torch.nn.Conv1d(input_channel,mid_channel,1,1,0,groups = groups),
            torch.nn.BatchNorm1d(mid_channel),
            torch.nn.ReLU(),
            channel_shuffle(groups=groups),
            torch.nn.Conv1d(mid_channel,mid_channel,3,self.stride,1,groups=mid_channel),
            torch.nn.BatchNorm1d(mid_channel),
            torch.nn.Conv1d(mid_channel,output_channel,1,1,0,groups=groups),
            torch.nn.BatchNorm1d(output_channel)
        )
        self.down = torch.nn.AvgPool1d(3,2,1)

    def forward(self,x):
        old_x = x
        out = self.conv(x)
        if self.stride == 1:
            out += old_x
            return F.relu(out)
        else:
            out = torch.cat((self.down(x),self.conv(x)),1)
            return F.relu(out)

class shufflenetv1(torch.nn.Module):
    def __init__(self,in_channels,classes):
        super().__init__()
        self.classes = classes
        self.features = torch.nn.Sequential(
            torch.nn.Conv1d(in_channels, 24, 3, 2, 1),
            torch.nn.Conv1d(24, 120, 3, 2, 1),
            torch.nn.MaxPool1d(3, 2, 1),
            shuffleblock(120, 120 // 4, 120, 2, 3),
            shuffleblock(240, 240 // 4,240,1),
            shuffleblock(240, 240 // 4, 240, 1),
            shuffleblock(240, 240 // 4, 240, 1),

            shuffleblock(240,240 // 4,240),
            shuffleblock(480,480 // 4,480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),

            shuffleblock(480,480 // 4,480),
            shuffleblock(960,960 // 4,960,1),
            shuffleblock(960, 960 // 4, 960,1),
            shuffleblock(960, 960 // 4, 960,1),

            torch.nn.AdaptiveAvgPool1d(1)

        )
        self.classifier = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(960,self.classes)
        )

    def forward(self,x):
        x = self.features(x)
        x = self.classifier(x)
        return x

if __name__ == '__main__':
    x = torch.randn(1,1,200)
    model = shufflenetv1(1,10)
    y = model(x)
    print(y.shape)
    #print(model)

🌜 二维Channel Shuffle实现 🌛

当进行二维通道重排时，由于二维数据形状为(batch,channel,width,highth)所以注意需要对四个数据进行重排，下列为实现代码：

import torch
import torch.nn.functional as F

class channel_shuffle(torch.nn.Module):
    def __init__(self,groups):
        super().__init__()
        self.groups = groups

    def forward(self,x):
        b,c,h,w = x.size()
        group_channel = c // self.groups
        x = x.reshape(b,self.groups,group_channel,h,w)
        x = x.permute(0,2,1,3,4).contiguous()
        x = x.reshape(b,c,h,w)
        return x

🌜 二维Group Convolution实现 🌛

二维分组卷积只需要对一些一维卷积层等改为二维的即可。

class shuffleblock(torch.nn.Module):
    def __init__(self,input_channel,mid_channel,output_channel,stride = 2,groups = 3):
        super().__init__()
        self.stride = stride
        self.conv = torch.nn.Sequential(
            torch.nn.Conv2d(input_channel,mid_channel,1,1,0,groups = groups),
            torch.nn.BatchNorm2d(mid_channel),
            torch.nn.ReLU(),
            channel_shuffle(groups=groups),
            torch.nn.Conv2d(mid_channel,mid_channel,3,self.stride,1,groups=mid_channel),
            torch.nn.BatchNorm2d(mid_channel),
            torch.nn.Conv2d(mid_channel,output_channel,1,1,0,groups=groups),
            torch.nn.BatchNorm2d(output_channel)
        )
        self.down = torch.nn.AvgPool2d(3,2,1)

    def forward(self,x):
        old_x = x
        out = self.conv(x)
        if self.stride == 1:
            out += old_x
            return F.relu(out)
        else:
            out = torch.cat((self.down(x),self.conv(x)),1)
            return out

🌜 二维ShuffleNetV1实现 🌛

这里也是把对应卷积层改为可以处理二维数据的卷积层，下面直接附上完整代码：

import torch
import torch.nn.functional as F

class channel_shuffle(torch.nn.Module):
    def __init__(self,groups):
        super().__init__()
        self.groups = groups

    def forward(self,x):
        b,c,h,w = x.size()
        group_channel = c // self.groups
        x = x.reshape(b,self.groups,group_channel,h,w)
        x = x.permute(0,2,1,3,4).contiguous()
        x = x.reshape(b,c,h,w)
        return x

class shuffleblock(torch.nn.Module):
    def __init__(self,input_channel,mid_channel,output_channel,stride = 2,groups = 3):
        super().__init__()
        self.stride = stride
        self.conv = torch.nn.Sequential(
            torch.nn.Conv2d(input_channel,mid_channel,1,1,0,groups = groups),
            torch.nn.BatchNorm2d(mid_channel),
            torch.nn.ReLU(),
            channel_shuffle(groups=groups),
            torch.nn.Conv2d(mid_channel,mid_channel,3,self.stride,1,groups=mid_channel),
            torch.nn.BatchNorm2d(mid_channel),
            torch.nn.Conv2d(mid_channel,output_channel,1,1,0,groups=groups),
            torch.nn.BatchNorm2d(output_channel)
        )
        self.down = torch.nn.AvgPool2d(3,2,1)

    def forward(self,x):
        old_x = x
        out = self.conv(x)
        if self.stride == 1:
            out += old_x
            return F.relu(out)
        else:
            out = torch.cat((self.down(x),self.conv(x)),1)
            return out

class shufflenetv1(torch.nn.Module):
    def __init__(self,in_channels,classes):
        super().__init__()
        self.classes = classes
        self.features = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels, 24, 3, 2, 1),
            torch.nn.Conv2d(24, 120, 3, 2, 1),
            torch.nn.MaxPool2d(3, 2, 1),
            shuffleblock(120, 120 // 4, 120, 2, 3),
            shuffleblock(240, 240 // 4,240,1),
            shuffleblock(240, 240 // 4, 240, 1),
            shuffleblock(240, 240 // 4, 240, 1),

            shuffleblock(240,240 // 4,240),
            shuffleblock(480,480 // 4,480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),
            shuffleblock(480, 480 // 4, 480,1),

            shuffleblock(480,480 // 4,480),
            shuffleblock(960,960 // 4,960,1),
            shuffleblock(960, 960 // 4, 960,1),
            shuffleblock(960, 960 // 4, 960,1),

            torch.nn.AdaptiveAvgPool2d(1)

        )
        self.classifier = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(960,self.classes)
        )

    def forward(self,x):
        x = self.features(x)
        x = self.classifier(x)
        return x

if __name__ == '__main__':
    x = torch.randn(1,2,224,224)
    model = shufflenetv1(2,10)
    y = model(x)
    print(y.shape)
    #print(model)

至此ShuffleNetV1详解及搭建结束。

🐏ShuffleNetV2详解 🐑

首先附上原论文连接：https://arxiv.org/pdf/1807.11164v1.pdf
在这里插入图片描述
本片ShuffleNetV2的论文也是非常的重要，里面提到的很多思想对于后续网络的搭建都很有帮助。首先来看一下论文的摘要部分：

摘要的主要内容首先是否定了使用FLOPs作为计算复杂度的简介度量的方法，其次提出对于计算复杂度的度量方法，还取决于内存访问成本和平台特性等特性。那么FLOPs究竟指的是什么，首先看一下注释。

FLOPS:全大写，指每秒的运算次数，可以理解为计算的速度。是衡量硬件性能的一个指标。（硬件）
FLOPs：s小写，指浮点运算数，理解为运算量。可以用来衡量算法/模型的复杂度。（模型）

我们在模型中常用的指标是s小写的FLOPs，他用来衡量模型或者算法的复杂度。并且在本篇论文中提出了4条设计高效的网络准则以及基于这四条准则提出了一个新的block设计。

🐏ShuffleNetV2创新点 🐑

为了减少算法的复杂度，ShuffleNetV2模型在设计新的block之前提出了四个设计高效的网络的准则，这四条准则就是本篇论文比较厉害的创新点，为后续轻便型网络的设计提供了很大的理论基础，四个准则分别是：Equal channel widtth minimizes memory access cost(MAC)、Excessive group convolution increases MAC、Network fragmentation reduces degree of parallelism、Element-wise operations are non-negligible。

🐏Equal channel widtth minimizes memory access cost(MAC) 🐑

首先第一个准则是当卷积层的输入特征矩阵与输出特征矩阵channel相等时MAC最小（保持FLOPs不变），也就是说我们要尽可能保证大部分网络中输入输出的的通道数保持不变，极大地减少计算的复杂度。
在这里插入图片描述
正如上图所示，当输入输出通道之比为1:1的时候，GPU的计算效率最大，当比值为1：12时计算效率为之前的大概1/2。

🐏Excessive group convolution increases MAC 🐑

第二个准则是当GConv的groups增大时，（保持FLOPs不变），MAC也会增大。也就是说如果想要增大计算效率就要尽可能缩小分组卷积所分的组。
在这里插入图片描述
可以看出来当分组为1的时候GPU的计算效率最大，然而分组为8的时候计算效率仅为前者的1/4左右。

🐏Network fragmentation reduces degree of parallelism 🐑

第三个准则是当网络设计的碎片化程度越高，速度越慢，这个可以理解为，网络设计的分支越多，网络的计算效率越慢。
在这里插入图片描述

由上两个图可以看到，当我们试图使用堆叠网络结构来挖掘更大的特征时，网络碎片化的程度往往也决定着计算效率的低下。

🐏Element-wise operations are non-negligible 🐑

最后一个准则是 Element-wise操作带来的影响是不可忽视的，这里面的Element-wise操作指的是short-cut连接以及Relu函数等操作。
在这里插入图片描述
由上图可以看到，当不使用short-cut连接和Relu函数时，GPU的计算效率最高。

🐏ShuffleNetV2网络搭建及pytorch实现 🐑

由于ShuffleNetV2的网络结构与ShuffleNetV1的网络结构在大体上相同，特别是通道重排的部分，所以我们这里主要介绍ShuffleNetV2的专属Block的搭建以及总体搭建。

🐏一维ShuffleNetV2 Block的搭建 🐑

首先来看一下BLock的网络结构。
在这里插入图片描述
a和b图是ShuffleNetV1的网络结构，c和d是ShuffleNetV2的网络结构，我们可以看到最大的变化首先是当stride为1的时候，ShuffleNetV2取消了short-cut连接，采用了cat连接，并且为了保证通道不变，在进入两个分支之前先分成两份使其通道减半；其次是当stride为2的时候，左侧分支变味了3x3的DW卷积核1x1的普通卷积，并且全程取消了分组卷积，改成了1x1卷积核DW卷积，下面看一下pytorch的代码实现。

class shuffle_block(torch.nn.Module):
    def __init__(self,in_channels,mid_channels,out_channels,stride):
        super().__init__()
        self.in_channels = in_channels
        self.mid_channels = mid_channels
        self.out_channels = out_channels
        self.stride = stride

        if self.stride == 2:
            self.left = torch.nn.Sequential(
                torch.nn.Conv1d(self.in_channels,self.in_channels,3,2,1,groups=self.in_channels),
                torch.nn.BatchNorm1d(self.in_channels),
                torch.nn.Conv1d(self.in_channels,self.out_channels,1),
                torch.nn.BatchNorm1d(self.out_channels),
                torch.nn.ReLU()
            )
        else:
            self.in_channels = self.in_channels // 2
            self.out_channels = self.out_channels // 2
            self.mid_channels = self.mid_channels // 2
            self.left = torch.nn.Sequential()

        self.right = torch.nn.Sequential(
            torch.nn.Conv1d(self.in_channels, self.mid_channels, 1),
            torch.nn.BatchNorm1d(self.mid_channels),
            torch.nn.ReLU(),
            torch.nn.Conv1d(self.mid_channels, self.mid_channels, 3, self.stride, 1, groups=self.mid_channels),
            torch.nn.BatchNorm1d(self.mid_channels),
            torch.nn.Conv1d(self.mid_channels, self.out_channels, 1),
            torch.nn.BatchNorm1d(self.out_channels),
            torch.nn.ReLU()
        )
        self.shuffle = channel_shuffle()

    def forward(self,x):
        if self.stride == 2:
            x_left = self.left(x)
            x_right = self.right(x)
            out = torch.cat((x_left,x_right),dim = 1)
        else:
            xl,xr = x.chunk(2,dim=1)
            x_left = self.left(xl)
            x_right = self.right(xr)
            out = torch.cat((x_left,x_right),dim = 1)
        return self.shuffle(out)

🐏一维ShuffleNetV2的搭建 🐑

这里总体的思想和一维的ShuffleNetV1搭建思想相同，先看一下总体网络结构：
在这里插入图片描述
假如我们这里以2x的部分为例，那么总体代码为：

import torch

class channel_shuffle(torch.nn.Module):
    def __init__(self,groups = 2):
        super().__init__()
        self.groups = groups

    def forward(self,x):
        b,c,l = x.size()
        self.group_channels = c // self.groups
        x = x.reshape(b,self.groups,self.group_channels,l)
        x = x.permute(0,2,1,3).contiguous()
        x = x.reshape(b,c,l)
        return x

class shuffle_block(torch.nn.Module):
    def __init__(self,in_channels,mid_channels,out_channels,stride):
        super().__init__()
        self.in_channels = in_channels
        self.mid_channels = mid_channels
        self.out_channels = out_channels
        self.stride = stride

        if self.stride == 2:
            self.left = torch.nn.Sequential(
                torch.nn.Conv1d(self.in_channels,self.in_channels,3,2,1,groups=self.in_channels),
                torch.nn.BatchNorm1d(self.in_channels),
                torch.nn.Conv1d(self.in_channels,self.out_channels,1),
                torch.nn.BatchNorm1d(self.out_channels),
                torch.nn.ReLU()
            )
        else:
            self.in_channels = self.in_channels // 2
            self.out_channels = self.out_channels // 2
            self.mid_channels = self.mid_channels // 2
            self.left = torch.nn.Sequential()

        self.right = torch.nn.Sequential(
            torch.nn.Conv1d(self.in_channels, self.mid_channels, 1),
            torch.nn.BatchNorm1d(self.mid_channels),
            torch.nn.ReLU(),
            torch.nn.Conv1d(self.mid_channels, self.mid_channels, 3, self.stride, 1, groups=self.mid_channels),
            torch.nn.BatchNorm1d(self.mid_channels),
            torch.nn.Conv1d(self.mid_channels, self.out_channels, 1),
            torch.nn.BatchNorm1d(self.out_channels),
            torch.nn.ReLU()
        )
        self.shuffle = channel_shuffle()

    def forward(self,x):
        if self.stride == 2:
            x_left = self.left(x)
            x_right = self.right(x)
            out = torch.cat((x_left,x_right),dim = 1)
        else:
            xl,xr = x.chunk(2,dim=1)
            x_left = self.left(xl)
            x_right = self.right(xr)
            out = torch.cat((x_left,x_right),dim = 1)
        return self.shuffle(out)

class shuffleNetV2(torch.nn.Module):
    def __init__(self,in_channels,classes):
        super().__init__()
        self.in_channels = in_channels
        self.classes = classes
        self.conv_1 = torch.nn.Sequential(
            torch.nn.Conv1d(self.in_channels,24,3,2,1),
            torch.nn.MaxPool1d(3,2)
        )
        self.stage_2 = torch.nn.Sequential(
            shuffle_block(24,24 // 4,122,2),
            shuffle_block(244,244 // 4,244,1),
            shuffle_block(244,244 // 4,244,1),
            shuffle_block(244, 244 // 4, 244, 1),
        )
        self.stage_3 = torch.nn.Sequential(
            shuffle_block(244,244 // 4,244,2),
            shuffle_block(488,488 // 4,488,1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
        )
        self.stage_4 = torch.nn.Sequential(
            shuffle_block(488,488 // 4,488,2),
            shuffle_block(976,976 // 4,976,1),
            shuffle_block(976, 976 // 4, 976, 1),
            shuffle_block(976, 976 // 4, 976, 1),
        )
        self.conv_5 = torch.nn.Sequential(
            torch.nn.Conv1d(976,2048,1),
            torch.nn.AdaptiveAvgPool1d(1)
        )
        self.classfier = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(2048,self.classes)
        )
    def forward(self,x):
        x = self.conv_1(x)
        x = self.stage_2(x)
        x = self.stage_3(x)
        x = self.stage_4(x)
        x = self.conv_5(x)
        x = self.classfier(x)
        return x

if __name__ == '__main__':
    x = torch.randn(1,2,224)
    model = shuffleNetV2(2,10)
    print(model)
    y = model(x)
    print(y.shape)
    for layers in model.children():
        x = layers(x)
        print(layers.__class__.__name__,'output shape:',x.shape)

🐏二维ShuffleNetV2的搭建 🐑

一维变二维的方法和ShuffleNetV1的一样，因此这里不过多赘述，直接附上ShuffleNetV2_2D代码：

import torch

class channel_shuffle(torch.nn.Module):
    def __init__(self,groups = 2):
        super().__init__()
        self.groups = groups

    def forward(self,x):
        b,c,h,w = x.size()
        self.group_channels = c // self.groups
        x = x.reshape(b,self.groups,self.group_channels,h,w)
        x = x.permute(0,2,1,3,4).contiguous()
        x = x.reshape(b,c,h,w)
        return x

class shuffle_block(torch.nn.Module):
    def __init__(self,in_channels,mid_channels,out_channels,stride):
        super().__init__()
        self.in_channels = in_channels
        self.mid_channels = mid_channels
        self.out_channels = out_channels
        self.stride = stride

        if self.stride == 2:
            self.left = torch.nn.Sequential(
                torch.nn.Conv2d(self.in_channels,self.in_channels,3,2,1,groups=self.in_channels),
                torch.nn.BatchNorm2d(self.in_channels),
                torch.nn.Conv2d(self.in_channels,self.out_channels,1),
                torch.nn.BatchNorm2d(self.out_channels),
                torch.nn.ReLU()
            )
        else:
            self.in_channels = self.in_channels // 2
            self.out_channels = self.out_channels // 2
            self.mid_channels = self.mid_channels // 2
            self.left = torch.nn.Sequential()

        self.right = torch.nn.Sequential(
            torch.nn.Conv2d(self.in_channels, self.mid_channels, 1),
            torch.nn.BatchNorm2d(self.mid_channels),
            torch.nn.ReLU(),
            torch.nn.Conv2d(self.mid_channels, self.mid_channels, 3, self.stride, 1, groups=self.mid_channels),
            torch.nn.BatchNorm2d(self.mid_channels),
            torch.nn.Conv2d(self.mid_channels, self.out_channels, 1),
            torch.nn.BatchNorm2d(self.out_channels),
            torch.nn.ReLU()
        )
        self.shuffle = channel_shuffle()

    def forward(self,x):
        if self.stride == 2:
            x_left = self.left(x)
            x_right = self.right(x)
            out = torch.cat((x_left,x_right),dim = 1)
        else:
            xl,xr = x.chunk(2,dim=1)
            x_left = self.left(xl)
            x_right = self.right(xr)
            out = torch.cat((x_left,x_right),dim = 1)
        return self.shuffle(out)

class shuffleNetV2(torch.nn.Module):
    def __init__(self,in_channels,classes):
        super().__init__()
        self.in_channels = in_channels
        self.classes = classes
        self.conv_1 = torch.nn.Sequential(
            torch.nn.Conv2d(self.in_channels,24,3,2,1),
            torch.nn.MaxPool2d(3,2)
        )
        self.stage_2 = torch.nn.Sequential(
            shuffle_block(24,24 // 4,122,2),
            shuffle_block(244,244 // 4,244,1),
            shuffle_block(244,244 // 4,244,1),
            shuffle_block(244, 244 // 4, 244, 1),
        )
        self.stage_3 = torch.nn.Sequential(
            shuffle_block(244,244 // 4,244,2),
            shuffle_block(488,488 // 4,488,1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
            shuffle_block(488, 488 // 4, 488, 1),
        )
        self.stage_4 = torch.nn.Sequential(
            shuffle_block(488,488 // 4,488,2),
            shuffle_block(976,976 // 4,976,1),
            shuffle_block(976, 976 // 4, 976, 1),
            shuffle_block(976, 976 // 4, 976, 1),
        )
        self.conv_5 = torch.nn.Sequential(
            torch.nn.Conv2d(976,2048,1),
            torch.nn.AdaptiveAvgPool2d(1)
        )
        self.classfier = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(2048,self.classes)
        )
    def forward(self,x):
        x = self.conv_1(x)
        x = self.stage_2(x)
        x = self.stage_3(x)
        x = self.stage_4(x)
        x = self.conv_5(x)
        x = self.classfier(x)
        return x

if __name__ == '__main__':
    x = torch.randn(1,2,224,224)
    model = shuffleNetV2(2,10)
    print(model)
    y = model(x)
    print(y.shape)
    for layers in model.children():
        x = layers(x)
        print(layers.__class__.__name__,'output shape:',x.shape)