前言
本文主要记录了一些经典的卷积网络架构,以及相应的pytorch代码。提示:以下是本篇文章正文内容,下面案例可供参考
一、EfficientNetV2
1.1 网络结构
EfficientNetV2-s网络结构
补充 在源码中,stage6的输出channels等于256,stage7的输出channels是1280.
Fused-MBConv模块
补充 在源码中没有SE模块,同时对应的Dropout模块不是节点的随机失活而是整那个模块失活。
同样,shortcut分支只有在stride=1,且输入输出channel相同时才存在。
1.1.1 论文思路
作者希望通过结合使用NAS与神经网络缩放来共同优化网络的训练速度和参数效率。
同时在训练过程中使用渐进式学习的方式来自适应的根据图像大小来调整正则化因子(dropout、数据增广 Data Augment)以加速网络训练,同时减少其带来的性能上的损失。
作者在研究中发现
1.使用非常大的图像尺寸训练很慢
2.在网络浅层中使用Depthwise convolutions 速度会很慢
3.每个stage都按照相同比例放大是次优的
基于上述分析,作者提出了Fused-MBConv结构和渐近式学习。
对于渐进式学习,作者认为对于不同大小的图像尺寸需要使用不同程度的正则化,即在早期训练中,采用小图像尺寸和弱正则化来训练网络,然后逐渐增加图像尺寸并添加更强的正则化,基于渐进式调整大小。由此,可以在不导致准确率下降的情况下加速网络的训练速度。
Depthwise convolutions在早期层中很慢,这主要是因为它通常无法充分利用现代加速器。于是,作者在浅层网络结构中去除了DW卷积。
对于非均匀缩放,作者未提如何得到对应的缩放参数。
额外优化
(1) 我们将最大推理图像大小限制为 480,因为非常 大图像通常会导致昂贵的内存和训练速度开销;
(2) 作为启发式方法,我们还逐渐向后期阶段(例如表 4 中的阶段 5 和 6)添加更多层,以在不增加太多运行时开销的情况下增加网络容量。
从小图像尺寸和弱正则化(epoch=1)开始,然后随着更大的图像尺寸和更强的正则化逐渐增加学习难度:更大的丢失率、RandAugment 幅度和混合 比率
1.1.2 总结与亮点
1.在浅层网络中去除DW卷积,改用Fused-MBConv模块。
2.使用渐进式学习加上逐渐增强的正则化来加速网络训练
3.使用非均匀的缩放策略
参考文献
EfficientNetV2 : Smaller Model and Faster Training
1.2 代码
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from typing import Callable,List,Optional
from functools import partial
from torch.tensor import Tensor
def _make_divisible(ch,divisor=8,min_ch=None):
if min_ch is None:
min_ch=divisor
## 找到最接近对应倍数的整数(可向上,可向下)
new_ch=max(min_ch,int(ch+divisor/2)//divisor*divisor)
if new_ch <0.9*ch:
new_ch+=divisor
return new_ch
class ConvBNActivation(nn.Sequential):
def __init__(self,in_planes:int ,out_planes:int,kernel_size:int=3,stride:int =1,
groups:int =1,norm_layer:Optional[Callable[...,nn.Module]]=None,
activation_layer:Optional[Callable[...,nn.Module]]=None
):
## 计算padding
padding=(kernel_size-1)//2
if norm_layer is None:
norm_layer=nn.BatchNorm2d
if activation_layer is None:
activation_layer=nn.ReLU6
super(ConvBNActivation, self).__init__(nn.Conv2d(in_planes,out_planes,
kernel_size=kernel_size,
padding=padding,
groups=groups,
bias=False),
norm_layer(out_planes),
activation_layer(inplace=True)
)
## SE模块
class SqueezeExcitaion(nn.Module):
def __init__(self,input_c:int,squeeze_factor:int=4):
super(SqueezeExcitaion,self).__init__()
sequeeze_c=_make_divisible(input_c//squeeze_factor,8)
self.fc1=nn.Conv2d(input_c,sequeeze_c,1)
self.fc2=nn.Conv2d(sequeeze_c,input_c,1)
def forward(self,x:Tensor)-> Tensor:
scale=F.adaptive_avg_pool2d(x,output_size=(1,1))
scale=self.fc1(scale)
scale=F.relu(scale,inplace=True)
scale=self.fc2(scale)
scale=F.hardsigmoid(scale)
return scale*x
## width_factor 控制channel的超参数
class InvertedResidualConfig:
def __init__(self,
input_c:int,
output_c:int,
expsize:int,
kernel_size:int,
use_se:bool,
activation_func:str,
stride:int,
width_factor:float
):
self.input_c=self.changeSize(input_c,width_factor)
self.output_c=self.changeSize(output_c,width_factor)
self.kernel_size=kernel_size
self.use_se=use_se
self.use_hs=activation_func=="HS"
self.stride=stride
self.expsize=self.changeSize(expsize,width_factor)
@staticmethod
def changeSize(ch:int,factor:float,divisor:int=8):
return _make_divisible(ch*factor,divisor)
class InvertedResidual(nn.Module):
def __init__(self,
cfg:InvertedResidualConfig,
norm_layer:Callable[...,nn.Module]
):
super(InvertedResidual,self).__init__()
if cfg.stride not in [1,2]:
raise ValueError('illegal stride value')
if cfg.output_c==cfg.input_c and cfg.stride==1:
self.use_shortcut=True
layers:List[nn.Module]=[]
activation_func=nn.Hardswish if cfg.use_hs else nn.ReLU
if cfg.input_c!=cfg.expsize:
layers.append(ConvBNActivation(cfg.input_c,
cfg.expsize,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=activation_func
))
layers.append(ConvBNActivation(cfg.expsize,
cfg.expsize,
groups=cfg.expsize,
kernel_size=cfg.kernel_size,
stride=cfg.stride,
norm_layer=norm_layer,
activation_layer=activation_func
))
if cfg.use_se:
layers.append(SqueezeExcitaion(cfg.expsize))
layers.append(ConvBNActivation(cfg.expsize,
cfg.output_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Identity
))
self.block=nn.Sequential(*layers)
self.out_channel=cfg.output_c
self.is_strided=cfg.stride>1
def forward(self,x:Tensor)->Tensor:
result=self.block(x)
if self.use_shortcut:
result+=x
return result
class MobileNetV3(nn.Module):
def __init__(self,inverted_setting:List[InvertedResidualConfig],
last_channel:int,
num_classes:int=1000,
block:Optional[Callable[...,nn.Module]]=None,
norm_layer:Optional[Callable[...,nn.Module]]=None
):
super(MobileNetV3,self).__init__()
if not inverted_setting:
raise ValueError("The Inverted_setting should not be empty")
elif not isinstance(inverted_setting,List) and all([isinstance(s,InvertedResidualConfig) for s in inverted_setting]):
raise TypeError("illegal type of Inverted_setting ")
if block is None:
block=InvertedResidual
if norm_layer is None:
norm_layer=partial(nn.BatchNorm2d,eps=0.001,momentum=0.01)
layers:List[nn.Module]=[]
firstconv_output_c=inverted_setting[0].input_c
layers.append(ConvBNActivation(3,firstconv_output_c,
kernel_size=3,stride=2,norm_layer=norm_layer,
activation_layer=nn.Hardswish
))
for cnf in inverted_setting:
layers.append(block(cnf,norm_layer))
lastconv_input_c=inverted_setting[-1].output_c
## 论文中固定为6倍
lastconv_output_c=6*lastconv_input_c
layers.append(ConvBNActivation(lastconv_input_c,
lastconv_output_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Hardswish
))
self.features=nn.Sequential(*layers)
self.avgpool=nn.AdaptiveAvgPool2d(1)
self.classifier=nn.Sequential(nn.Linear(lastconv_output_c,last_channel),
nn.Hardswish(inplace=True),
nn.Dropout(0.2,inplace=True),
nn.Linear(last_channel,num_classes)
)
def forward_impl(self,x:Tensor)->Tensor:
x=self.features(x)
x=self.avgpool(x)
x=torch.flatten(x,1)
x=self.classifier(x)
return x
def forward(self,x:Tensor)->Tensor:
return self.forward_impl(x)
def mobilenet_v3_large(num_classes:int=100,reduced_tail:bool=False)->MobileNetV3:
width_multi=1.0 ## 控制对应的通道衰减数
bneck_conf=partial(InvertedResidualConfig,width_factor=width_multi)
changeSize=partial(InvertedResidualConfig.changeSize,factor=width_multi)
## pytorch官方设置的参数,用于控制后面三层的参数数量
reduce_divider=2 if reduced_tail else 1
inverted_residual_setting=[
bneck_conf(16,3,16,16,False,"RE",1),
bneck_conf(16,3,64,24,False,"RE",2),
bneck_conf(24,3,72,24,False,"RE",1),
bneck_conf(24,5,72,40,True,"RE",2),
bneck_conf(40,5,120,40,True,"RE",1),
bneck_conf(40,5,120,40,True,"RE",1),
bneck_conf(40,3,240,80,False,"HS",2),
bneck_conf(80,3,200,80,False,"HS",1),
bneck_conf(80,3,184,80,False,"HS",1),
bneck_conf(80,3,184,80,False,"HS",1),
bneck_conf(80,3,480,80,True,"HS",1),
bneck_conf(112,3,672,112,True,"HS",1),
bneck_conf(112,5,672,160//reduce_divider,True,"HS",2),
bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1),
bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1)
]
last_channel=changeSize(1280//reduce_divider)
return MobileNetV3(inverted_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)
def mobilenet_v3_small(num_classes:int=100,reduced_tail:bool=False)->MobileNetV3:
width_multi=1.0 ## 控制对应的通道衰减数
bneck_conf=partial(InvertedResidualConfig,width_factor=width_multi)
changeSize=partial(InvertedResidualConfig.changeSize,factor=width_multi)
## pytorch官方设置的参数,用于控制后面三层的参数数量
reduce_divider=2 if reduced_tail else 1
inverted_residual_setting=[
bneck_conf(16,3,16,16,True,"RE",2),
bneck_conf(16,3,72,24,False,"RE",2),
bneck_conf(24,3,88,24,False,"RE",1),
bneck_conf(24,5,96,40,True,"RE",2),
bneck_conf(40,5,240,40,True,"HS",1),
bneck_conf(40,5,240,40,True,"HS",1),
bneck_conf(40,5,120,48,True,"HS",1),
bneck_conf(48,5,144,48,True,"HS",1),
bneck_conf(48,5,288,96//reduce_divider,False,"HS",1),
bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1),
bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1)
]
last_channel=changeSize(1024//reduce_divider)
return MobileNetV3(inverted_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)