Table of Contents
论文名:An Extremely Efficient Convolutional Neural Network for Mobile Devices
下载地址:https://arxiv.org/pdf/1707.01083.pdf
正文
ShuffleNet是Face++继Google的mobilenet之后提出的一种新轻量级网络,ShuffleNet采用新的手段来降低神经网络计算量的论文,主要针对移动端、嵌入式端的落地。
1.原理解析
前面的博客讲解了MobileNet的设计原理(还不太了解的请移步MobileNet),MobileNet主要是对分组卷积的极致利用,同时采用pointwise conv来解决不同通道信息无法交流的问题.ShuffleNet同样采用分组卷积的手段来降低网络的参数量,但是提出的新的方法来解决不同通道信息无法交流的问题,即channel shuffle操作.相信大家如果看懂了MobileNet的文章,再来理解这篇文章的思想应该是很容易的。
channel shuffle
下面这张图是很清晰的描述了channel shuffle的原理。如Figure 1的图(a)所示,feature map是由多个分组卷积的输出叠加而成,不同组的输出feature之间相互独立,阻碍了不同组间的信息流动,从而降低了信息的表达能力。为了解决这个问题,如图(b)所示,将每组的输出feature进一步分组,将不同组的feature进行混洗,如图(c)所示,从而得到了论文中的channel shuffle操作。
2.网络结构
ShuffleNet Unit
为了方便构建网络,文章构建了一个结构单元Unit,如下图所示.(a)是MobileNet的depthwise conv结构,(b)是ShuffleNet的组卷积GConv+channel shuffle操作的Unit,(c)是针对stride=2的设计的Unit.
整体结构
整个网络结构的搭建主要分为三个阶段:
- 每个阶段的第一个block的步长为2,下一阶段的通道翻倍
- 每个阶段内的除步长其他超参数保持不变
- 每个ShuffleNet unit的bottleneck通道数为输出的1/4(和ResNet设置一致)
3.ShuffleNetV1实现
Unit
def shuffle(x, groups):
N, C, H, W = x.size()
out = x.view(N, groups, C // groups, H, W).permute(0, 2, 1, 3, 4).contiguous().view(N, C, H, W)
return out
class Bottleneck(nn.Module):
def __init__(self, in_channels, out_channels, stride, groups):
super().__init__()
mid_channles = int(out_channels/4)
if in_channels == 24:
self.groups = 1
else:
self.groups = groups
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels, mid_channles, 1, groups=self.groups, bias=False),
nn.BatchNorm2d(mid_channles),
nn.ReLU(inplace=True)
)
self.conv2 = nn.Sequential(
nn.Conv2d(mid_channles, mid_channles, 3, stride=stride, padding=1, groups=mid_channles, bias=False),
nn.BatchNorm2d(mid_channles),
nn.ReLU(inplace=True),
)
self.conv3 = nn.Sequential(
nn.Conv2d(mid_channles, out_channels, 1, groups=groups, bias=False),
nn.BatchNorm2d(out_channels)
)
self.shortcut = nn.Sequential(nn.AvgPool2d(3, stride=2, padding=1))
self.stride = stride
def forward(self, x):
out = self.conv1(x)
out = shuffle(out, self.groups)
out = self.conv2(out)
out = self.conv3(out)
if self.stride == 2:
res = self.shortcut(x)
out = F.relu(torch.cat([out, res], 1))
else:
out = F.relu(out+x)
return out
ShuffleNetV1
class ShuffleNet(nn.Module):
def __init__(self, groups, channel_num, class_num=settings.CLASSES_NUM):
super().__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(3, 24, 3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(24),
nn.ReLU(inplace=True)
)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.stage2 = self.make_layers(24, channel_num[0], 4, 2, groups)
self.stage3 = self.make_layers(channel_num[0], channel_num[1], 8, 2, groups)
self.stage4 = self.make_layers(channel_num[1], channel_num[2], 4, 2, groups)
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Linear(channel_num[2], class_num)
def make_layers(self, input_channels, output_channels, layers_num, stride, groups):
layers = []
layers.append(Bottleneck(input_channels, output_channels - input_channels, stride, groups))
input_channels = output_channels
for i in range(layers_num - 1):
Bottleneck(input_channels, output_channels, 1, groups)
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(x)
x = self.stage2(x)
x = self.stage3(x)
x = self.stage4(x)
x = self.avgpool(x)
x = x.flatten(1)
x = self.fc(x)
return x