简介
SqueezeNet是对标AlexNet,性能水平相当,但参数量大幅降低,比AlexNet少50x,模型大小小于0.5MB的一个轻量化模型。
为了减少参数量且保持准确率SqueezeNet提出里三个设计策略:
- Strategy 1. Replace 3x3 filters with 1x1 filters.
把3x3的卷积换成1x1的卷积这样可以减少9倍的参数
- Strategy 2. Decrease the number of input channels to 3x3 filters.
不仅减少了3x3卷积的数量而且还减少了卷积核的输入channel
- Strategy 3. Downsample late in the network so that convolution layers have large activation maps.
下采样放在卷积层的后面,虽然今天的网络设计结构都是如此
The Fire Module
在Fire Module中作者提出使用三个超参数: s 1 x 1 s_{1x1} s1x1、 e 1 x 1 e_{1x1} e1x1、 e 3 x 3 e_{3x3} e3x3
这三个超参数分别是squeeze部分的1x1卷积核个数、expand部分中1x1卷积核的个数和
expand中3x3卷积核的个数。作者让 s 1 x 1 < ( e 1 x 1 + e 1 x 1 ) s_{1x1}< (e_{1x1} + e_{1x1} ) s1x1<(e1x1+e1x1),这样就达到减小了输入到3x3卷积层的channel数,是Stratege 2的体现。
网络架构
代码
Fire_block
we implement our expand layer with two separate convolution layers: a layer with 1x1 filters, and a layer with 3x3 filters. Then, we concatenate the outputs of these layers together in the channel dimension.
class Fire_block(nn.Module):
def __init__(self,in_channel,S1x1,E1x1,E3x3):
super(Fire_block, self).__init__()
#Squeeze
self.squeeze = nn.Conv2d(in_channel,S1x1,1)
self.squeeze_activateion = nn.ReLU(inplace=True)
#Expand
self.expand1x1 = nn.Conv2d(S1x1,E1x1,1)
self.expand1x1_activation = nn.ReLU(inplace=True)
self.expand3x3 = nn.Conv2d(E1x1,E3x3,3,1,1)
self.expand3x3_activation = nn.ReLU(inplace=True)
def forward(self,x):
x = self.squeeze(x)
x = self.squeeze_activateion(x)
#分成两个layer,再concat
expand1x1 = self.expand1x1(x)
expand1x1 = self.expand1x1_activation(expand1x1)
expand3x3 = self.expand3x3(x)
expand3x3 = self.expand3x3_activation(x)
result = torch.cat([expand1x1,expand3x3],dim=1)
return result
主网络
class SqueezeNet(nn.Module):
def __init__(self,num_classes=5):
super(SqueezeNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 96, 7, 2, 2),
nn.MaxPool2d(3, 2),
Fire_block(96, 16, 64, 64),
Fire_block(128, 16, 64, 64),
Fire_block(128, 32, 128, 128),
nn.MaxPool2d(3, 2),
Fire_block(256, 32, 128, 128),
Fire_block(256, 48, 192, 192),
Fire_block(384, 48, 192, 192),
Fire_block(384, 64, 256, 256),
nn.MaxPool2d(3, 2),
Fire_block(512, 64, 256, 256),
)
self.classifer = nn.Sequential(
nn.Conv2d(512,num_classes,1),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.Softmax(dim=1)
)
def forward(self,x):
x = self.features(x)
x = self.classifer(x)
return x
总结
优点:
- 在不大幅度降低准确率下,提高了推理速度
- 参数少
- 可部署于FPGA与嵌入式设备
- 无特殊层,可以方便迁移
缺点:
- 参数太少,难以解决复杂问题
- 虽然比AlexNet减少了50倍的参数,但是AlexNet本身全连接节点过于庞大,50倍参数的减少和SqueezeNet的设计并没有太大关系,去掉全连接之后3倍参数的减少更为合适,全连接层的参数多,对性能提升帮助不大,现在往往被pooling代替
虽然比AlexNet减少了50倍的参数,但是AlexNet本身全连接节点过于庞大,50倍参数的减少和SqueezeNet的设计并没有太大关系,去掉全连接之后3倍参数的减少更为合适,全连接层的参数多,对性能提升帮助不大,现在往往被pooling代替
- SqueezeNet通过更深的层数来弥补参数少,但推理时间反倒会更长,这与边缘部署实时性的要求有所矛盾!
学习笔记,侵删!