ShuffleNet
ShuffleNet v1
GConv虽然能够减少参数和计算量,但GConv中不同组之间信息没有交流。对于这个问题提出了Channel Shuffle的概念。
在ResNeXt网络中,1*1的普通卷积占据了计算量的93.4%,所以在ShuffleNet v1中作者将1*1的卷积换成了分组卷积GConv,其中(b)图对应stride = 1 的情况,(b)图对应stride = 2 的情况。
ShuffleNet v1的网络结构
理论计算力 浮点运算数FLOPs对比
可见ShuffleNet的Block所需FLOPs相对于ResNet和ResNeXt是最小的。
ShuffleNet v2
作者提出计算复杂度不能只看FLOPs,MAC(内存访问时间成本)、并行等级和平台都有影响。
对此作者对如何设计高效的网络提出了四条建议。
- 当卷积层的输入特征矩阵与输出特征矩阵channel相等时MAC最小(保持FLOPs不变时,针对1*1的卷积)
保持FLOPs不变的情况下,改变c1/c2的比值。可见当比值相差越大时,推理速度越慢。对于不同的平台架构效果也不同。
- 当GConv的groups增大时(保持FLOPs不变),MAC也会增大。
- 网络设计的碎片化(分支)程度越高,速度越慢。
- Element-wise操作的影响。
总结:使用平衡的卷积,让输入特征矩阵和输出特征矩阵的channel比值尽可能为1;注意分组卷积的计算成本;降低网络的碎片程度;减少使用 Element-wise操作。
针对上述建议对ShuffleNet v1的Block进行改进等得到ShuffleNet v2的Block。
图(d)为下采样(stride = 2)的情况。
ShuffleNet v2的网络结构
Pytorch搭建ShuffleNet V2
Block模块。对stride = 1和2的两种情况进行判断后生成相应的左分支;DW卷积静态化。
class InvertedResidual(nn.Module):
def __init__(self, input_c: int, output_c: int, stride: int):
super(InvertedResidual, self).__init__()
if stride not in [1, 2]:
raise ValueError("illegal stride value.")
self.stride = stride
assert output_c % 2 == 0
branch_features = output_c // 2
# 当stride为1时,input_channel应该是branch_features的两倍
# python中 '<<' 是位运算,可理解为计算×2的快速方法
assert (self.stride != 1) or (input_c == branch_features << 1)
if self.stride == 2:
self.branch1 = nn.Sequential(
self.depthwise_conv(input_c, input_c, kernel_s=3, stride=self.stride, padding=1),
nn.BatchNorm2d(input_c),
nn.Conv2d(input_c, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(branch_features),
nn.ReLU(inplace=True)
)
else:
self.branch1 = nn.Sequential()
self.branch2 = nn.Sequential(
nn.Conv2d(input_c if self.stride > 1 else branch_features, branch_features, kernel_size=1,
stride=1, padding=0, bias=False),
nn.BatchNorm2d(branch_features),
nn.ReLU(inplace=True),
self.depthwise_conv(branch_features, branch_features, kernel_s=3, stride=self.stride, padding=1),
nn.BatchNorm2d(branch_features),
nn.Conv2d(branch_features, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(branch_features),
nn.ReLU(inplace=True)
)
@staticmethod
def depthwise_conv(input_c: int,
output_c: int,
kernel_s: int,
stride: int = 1,
padding: int = 0,
bias: bool = False) -> nn.Conv2d:
return nn.Conv2d(in_channels=input_c, out_channels=output_c, kernel_size=kernel_s,
stride=stride, padding=padding, bias=bias, groups=input_c)
def forward(self, x: Tensor) -> Tensor:
if self.stride == 1:
x1, x2 = x.chunk(2, dim=1)
out = torch.cat((x1, self.branch2(x2)), dim=1)
else:
out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)
out = channel_shuffle(out, 2)
return out
ShuffleNet v2的网络结构
class ShuffleNetV2(nn.Module):
def __init__(self,
stages_repeats: List[int],
stages_out_channels: List[int],
num_classes: int = 1000,
inverted_residual: Callable[..., nn.Module] = InvertedResidual):
super(ShuffleNetV2, self).__init__()
if len(stages_repeats) != 3:
raise ValueError("expected stages_repeats as list of 3 positive ints")
if len(stages_out_channels) != 5:
raise ValueError("expected stages_out_channels as list of 5 positive ints")
self._stage_out_channels = stages_out_channels
# input RGB image
input_channels = 3
output_channels = self._stage_out_channels[0]
self.conv1 = nn.Sequential(
nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(output_channels),
nn.ReLU(inplace=True)
)
input_channels = output_channels
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# Static annotations for mypy
self.stage2: nn.Sequential
self.stage3: nn.Sequential
self.stage4: nn.Sequential
stage_names = ["stage{}".format(i) for i in [2, 3, 4]]
for name, repeats, output_channels in zip(stage_names, stages_repeats,
self._stage_out_channels[1:]):
seq = [inverted_residual(input_channels, output_channels, 2)]
for i in range(repeats - 1):
seq.append(inverted_residual(output_channels, output_channels, 1))
setattr(self, name, nn.Sequential(*seq))
input_channels = output_channels
output_channels = self._stage_out_channels[-1]
self.conv5 = nn.Sequential(
nn.Conv2d(input_channels, output_channels, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(output_channels),
nn.ReLU(inplace=True)
)
self.fc = nn.Linear(output_channels, num_classes)
def _forward_impl(self, x: Tensor) -> Tensor:
# See note [TorchScript super()]
x = self.conv1(x)
x = self.maxpool(x)
x = self.stage2(x)
x = self.stage3(x)
x = self.stage4(x)
x = self.conv5(x)
x = x.mean([2, 3]) # global pool
x = self.fc(x)
return x
def forward(self, x: Tensor) -> Tensor:
return self._forward_impl(x)
对于不同版本 0.5x、1x、1.5x和2x均有相应的定义以及官方预权重。
采用1x版本,使用官方对应的预训练权重,只对最后一个全连接层进行训练。在29个epoch后,损失已经降到了0.863,准确率也达到了0.856.
对郁金香图片的预测结果
EfficientNet
EfficientNet-B0网络
卷积层后都跟有BN和Swish激活函数
MBConv
n为倍率因子,在网络结构中MBConv后跟的数字即为n。
SE模块
EfficientNetB0~B7的倍率因子
使用Pytorch搭建EfficientNetB0
经过29个epoch后,损失降到了0.249,准确率达到了91.9%。对于图片的预测结果也很高。
Transformer中Self-Attention以及Multi-Head Attention学习
Self-Attention
Multi-Head Attention