第四周学习:MobileNetV1,V2,V3
Part 1 视频学习及论文阅读
1、MobileNetV1
MobileNetV1论文网址:[1704.04861] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (arxiv.org)
MobileNet于2017年由Google团队提出,为轻量级卷积神经网络,在稍微降低准确率的前提下,大大降低模型参数
模型亮点:(1)Depthwise Convolution(大大降低了运算量和参数数量)
(2)增加了超参数α(控制卷积核个数)和β(控制输入图像大小)
(1)Depthwise Convolution
相比于传统卷积,DW卷积每个卷积核负责一个channel,故输出特征矩阵和输入特征矩阵的channel相同。(分组卷积的思想,但是这样每个channel之间的关系就学习不到了,因此又用PW卷积)
(2)Depthwise Separable Convolution
深度可分的卷积操作由两部分组成,一部分即为DW卷积,另一部分为PW(Pointwise)卷积
PW卷积就是普通的卷积,只是卷积核大小为1。
(3)普通卷积和DW+PW卷积的计算量的区别
设卷积核大小为,输入图像大小为,输出为,则对普通卷积来说,其计算量为:
DW+PW卷积的计算量为:
(4)超参数α和β
改变模型卷积核个数后,模型的参数个数即其准确率和计算量如下表,可以看到在准确率下降不大的前提下可以大大减少模型的参数。可以根据需求去选择合适的α值。
下表是不同的输入图像尺寸对应的模型不同的准确率和计算量。改变图像大小可以在准确率降低较小的情况下,大幅降低计算量。
(5)在DW卷积中,部分卷积核容易废掉,即卷积核参数大部分为0。
2、MoblieNetV2
原文链接:[1801.04381] MobileNetV2: Inverted Residuals and Linear Bottlenecks
MobileNetV2于2018年由Google团队提出,与MobileNetV1相比其准确率更高,模型更小。
两个亮点:1、Inverter Residuals(倒残差结构)
2、Linear Bottlenecks
(1)倒残差结构
与传统残差模块两头大中间小的瓶颈结构相反,倒残差结构先使用1*1的卷积来升维,在用3*3的卷积核来进行DW卷积,最后再使用1*1的卷积来降维,形成了两头小中间大的结构(如下图a,b所示)
同时,倒残差结构使用的激活函数为ReLu6
(2)Linear Bottlenecks
作者发现,在低维处使用ReLu激活函数会丢失较多信息,因此最后使用Linear作为最后一层的激活函数。
兴趣流形应该位于高维激活空间的低维子空间:
1、如果感兴趣的流形在ReLU变换后仍保持非零体积,则它对应于线性变换。
2、Relu可以保留有关输入流形的完整信息,但前提是输入流形位于输入空间的低维子空间。
注意:只有当stride = 1且输入特征矩阵与输出特征矩阵shape相同时才有shortcut连接。
(3)基于pytorch搭建的MobileNetV2
import torch
from torch import nn
# V2
def _make_divisible(ch,divisor=8,min_ch=None):
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch,int(ch*divisor/2)//divisor*divisor)
# 确保向下取整时不会超过10%
if new_ch<0.9*ch:
new_ch+=diviser
return new_ch
class ConvBNReLU(nn.Sequential):
# group为1是普通卷积,group为输入特征矩阵的深度(in_channel)是DW卷积
def __init__(self,in_channel,out_channel,kernel_size=3,stride=1,groups=1):
padding = (kernel_size-1)//2
super(ConvBNReLU,self).__init__(
nn.Conv2d(in_channel,out_channel,kernel_size,stride,padding,groups=groups,bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU6(inplace=True)
)
# 倒残差结构
class InvertedResidual(nn.Module):
# expand_ratio即拓展因子t
def __init__(self,in_channel,out_channel,stride,expand_ratio):
super(InvertedResidual,self).__init__()
# 隐层
hidden_channel = in_channel*expand_ratio
# 是否用shortcut
self.use_shortcut = stride==1 and in_channel==out_channel
layers = []
if expand_ratio!=1:
# 1*1 PW
layers.append(ConvBNReLU(in_channel,hidden_channel,kernel_size=1))
layers.extend([
# 3*3 DW
ConvBNReLU(hidden_channel,hidden_channel,stride=stride,groups=hidden_channel),
# 1*1 PW(Linear)
nn.Conv2d(hidden_channel,out_channel,kernel_size=1,bias=False),
nn.BatchNorm2d(out_channel)
])
self.conv = nn.Sequential(*layers)
def forward(self,x):
if self.use_shortcut:
return x+self.conv(x)
else:
return self.conv(x)
class MobileNetV2(nn.Module):
def __init__(self,num_classes=1000,alpha=1.0,round_nearest=8):
super(MobileNetV2,self).__init__()
block = InvertedResidual
input_channel = _make_divisible(32*alpha,round_nearest)
last_channel = _make_divisible(1280*alpha,round_nearest)
inverted_residual_setting=[
# t,c,n,s
[1,16,1,1],
[6,24,2,2],
[6,32,3,2],
[6,64,4,2],
[6,96,3,1],
[6,160,3,2],
[6,320,1,1],
]
features = []
# conv1 layer
features.append(ConvBNReLU(3,input_channel,stride=2))
for t,c,n,s in inverted_residual_setting:
output_channel = _make_divisible(c*alpha,round_nearest)
for i in range(n):
stride = s if i==0 else 1
features.append(block(input_channel,output_channel,stride,expand_ratio=t))
input_channel = output_channel
features.append(ConvBNReLU(input_channel,last_channel,1))
self.features = nn.Sequential(*features)
# 分类器
self.avgpool = nn.AdaptiveAvgPool2d((1,1))
self.classifier = nn.Sequential(
nn.Dropout(0.2),
nn.Linear(last_channel,num_classes)
)
# 权重初始化
for m in self.modules():
if isinstance(m,nn.Conv2d):
nn.init.kaiming_normal_(m.weight,mode='fan_out')
if m.bias is not None:
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m,nn.BatchNorm2d):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m,nn.Linear):
nn.init.normal_(m.weight,0,0.1)
nn.init.zeros_(m.bias)
def forward(self,x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x,1)
x = self.classifier(x)
return x
3、MoblieNetV3
原文链接:https://arxiv.org/abs/1905.02244
MobileNet v3发表于2019年,该v3版本结合了v1的Depthwise Separable Convolution、v2的Inverted Residuals和Linear Bottleneck、SE模块,利用NAS(神经结构搜索)来搜索网络的配置和参数。
有三个亮点:
(1)更新Block(bneck)
①加入了SE模块(具体可见后面的SENet)
在经过3*3得到的特征矩阵后对每个channel进行池化处理得到一维向量,再通过两个全连接层得到输出向量,第一个全连接层的参数为特征矩阵channel的1/4,第二个全连接层的参数与特征矩阵的channel相等
以channel为2举例,经过平均池化得到大小为2的一维向量,再经过两个全连接层生成权重参数,再将原图像乘以这些权重参数。
②更新了激活函数
其激活函数由原来的ReLu变为了NL激活函数,但1*1的降维层没有激活函数
提出了h-swish激活函数,
其中称为h-sigmoid激活函数
(2)使用NAS搜索参数(Neural Architecture Search)
NAS定义:找一个神经网络结构使得此网络在验证集上的准确率(或者其他指标)最高
为了完成神经网络搜索,我们第一步需要创建搜索空间:也就是各类超参数集合
NAS得到的搜索结果为结构超参数,也就是我们上面定义的集合中取值。
详见神经网络结构搜索(NAS) - 知乎 (zhihu.com)
(3)重新设计耗时层结构
①减少第一个卷积层的卷积核个数(修改头部卷积核channel数量由32变为16)
②精简Last Stage
精简前后对比:
作者将其中的3*3卷积以及1*1卷积去掉后,精度并没有得到损失
(4)基于pytorch搭建的MobileNetV3
import torch
from torch import nn,Tensor
from torch.nn import function as F
from typing import Callable,List,Optional
from functools import partial
# V3
def _make_divisible(ch,divisor=8,min_ch=None):
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch,int(ch*divisor/2)//divisor*divisor)
# 确保向下取整时不会超过10%
if new_ch<0.9*ch:
new_ch+=diviser
return new_ch
class ConvBNActivation(nn.Sequential):
# group为1是普通卷积,group为输入特征矩阵的深度(in_channel)是DW卷积
def __init__(self,
in_planes:int,
out_planes:int,
kernel_size:int=3,
stride:int=1,
groups:int=1,
norm_layer:Optional[Callable[...,nn.Module]]=None,
activation_layer:Optional[Callable[...,nn.Module]]=None):
padding = (kernel_size-1)//2
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if activation_layer is None:
activation_layer = nn.ReLU6
super(ConvBNActivation,self).__init__(nn.Conv2d(
in_channel=in_planes,
out_channel=out_planes,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False),
norm_layer(out_planes),
activation_layer(inplace=True)))
# 注意力机制模块
class SqueezeExcitation(nn.Module):
def __init__(self,input_c:int,squeeze_factor:int=4):
super(SqueezeExcitation,self).__init__()
squeeze_c = _make_divisible(input_c//squeeze_factor,8)
self.fc1 = nn.Conv2d(input_c,squeeze_c,1)
self.fc2 = nn.Conv2d(squeeze_c,input_c,1)
def forward(self,x:Tensor) -> Tensor:
scale = F.adaptive_avg_pool2d(x,output_size=(1,1))
scale = self.fc1(scale)
scale = F.relu(scale,inplace=True)
scale = self.fc2(scale)
scale = F.hardsigmoid(scale,inplace=True)
return scale*x
# 倒残差结构
class InvertedResidualConfig:
def __init__(self,input_c:int,
kernel:int,
expanded_c:int,
out_c:int,
use_se:bool,
activation:str,
stride:int,
width_multi:float):
self.input_c = self.adjust_channels(input_c,width_multi)
self.kernel = kernel
self.expanded_c = self.adjust_channels(expanded_c,width_multi)
self.out_c = self.adjust_channels(out_c,width_multi)
self.use_se = use_se
self.use_hs = activation=="HS"
self.stride = stride
@staticmethod
def adjust_channels(channels:int,width_multi:float):
return _make_divisible(channels*width_multi,8)
# 倒残差结构
class InvertedResidual(nn.Module):
def __init__(self,cnf:InvertedResidualConfig,
norm_layer:Callable[...,nn.Module]):
super(InvertedResidual,self).__init__()
if cnf.stride not in [1,2]:
raise ValueError("illegal stride value.")
self.use_res_connect = (cnf.stride==1 and cnf.input_c==cnf.out_c)
layers:List[nn.Module] = []
activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU
if cnf.expanded_c != cnf.input_c:
layer.append(ConvBNActivation(cnf.input_c,
cnf.expanded_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=activation_layer))
# DW卷积
layers.append(ConvBNActivation(cnf.expanded_c,
cnf.expanded_c,
kernel_size=cnf.kernel,
stride=cnf.stride,
groups=cnf.expanded_c,
norm_layer=norm_layer,
activation_layer=activation_layer))
if cnf.use_se:
layers.append(SqueezeExcitation(cnf.expanded_c))
layers.append(ConvBNActivation(cnf.expanded_c,
cnf.out_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Identity))
self.block = nn.Sequential(*layers)
self.out_channels = cnf.out_c
def forward(self,x:Tensor) -> Tensor:
result = self.block(x)
if self.use_res_connect:
result += x
return result
class MobileNetV3(nn.Module):
def __init__(self,
inverted_residual_setting:List[InvertedResidualConfig],
last_channel:int,
num_classes:int=1000,
block:Optional[Callable[...,nn.Module]]=None,
norm_layer:Optional[Callable[...,nn.Module]]=None):
super(MobileNetV3,self).__init__()
if not inverted_residual_setting:
raise ValueError("The inverted_residual_setting should not be empty.")
elif not (isinstance(inverted_residual_setting,List) and
all([isinstance(s,InvertedResidualConfig) for s in inverted_residual_setting])):
raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig].")
if block is None:
block = InvertedResidual
if norm_layer is None:
norm_layer = partial(nn.BatchNorm2d,eps=0.001,momentum=0.01)
layers:List[nn.Module] = []
# 第一层
firstconv_output_c = inverted_residual_setting[0].input_c
layers.append(ConvBNActivation(3,firstconv_output_c,
kernel_size=3,
stride=2,
norm_layer=norm_layer,
activation_layer=nn.Hardswish))
# 倒残差模块
for cnf in inverted_residual_setting:
layers.append(block(cnf,norm_layer))
# 最后几层
lastconv_input_c = inverted_residual_setting[-1].out_c
lastconv_output_c = 6*lastconv_input_c
layers.append(ConvBNActivation(lastconv_input_c,
lastconv_output_c,
kernel_size=1,
norm_layer=norm_layer,
avtivation_layer=nn.Hardswish))
self.features = nn.Sequential(*layers)
# 分类器
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.classifier = nn.Sequential(
nn.Linear(lastconv_output_c,last_channel),
nn.Hardswish(inplace=True)
nn.Dropout(p=0.2,inplace=True),
nn.Linear(last_channel,num_classes)
)
# 权重初始化
for m in self.modules():
if isinstance(m,nn.Conv2d):
nn.init.kaiming_normal_(m.weight,mode='fan_out')
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m,(nn.BatchNorm2d,nn.GroupNorm)):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m,nn.Linear):
nn.init.normal_(m.weight,0,0.01)
nn.init.zeros_(m.bias)
def _forward_impl(self,x:Tensor) -> Tensor:
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x,1)
x = self.classifier(x)
return x
def mobilenet_v3_large(num_classes:int=1000,reduced_tail:bool=False) -> MobileNetV3:
width_multi = 1.0
bneck_conf = partial(InvertedResidualConfig,width_multi=width_multi)
adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_multi=width_multi)
reduce_divider = 2 if reduced_tail else 1
inverted_residual_setting = [
# input_c,kernel,expanded_c,out_c,use_se,activation,stride
bneck_conf(16,3,16,16,False,"RE",1),
bneck_conf(16,3,64,24,False,"RE",2), # C1
bneck_conf(24,3,72,24,False,"RE",1),
bneck_conf(24,5,72,40,True,"RE",2), # C2
bneck_conf(40,5,120,40,True,"RE",1),
bneck_conf(40,5,120,40,True,"RE",1),
bneck_conf(40,3,240,80,False,"HS",2), # C3
bneck_conf(80,3,200,80,False,"HS",1),
bneck_conf(80,3,184,80,False,"HS",1),
bneck_conf(80,3,184,80,False,"HS",1),
bneck_conf(80,3,480,112,True,"HS",1),
bneck_conf(112,3,672,112,True,"HS",1),
bneck_conf(112,5,672,160//reduce_divider,True,"HS",2), # C4
bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1),
bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1)]
last_channel = adjust_channels(1280//reduce_divider) # C5
return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)
def mobilenet_v3_small(num_classes:int=1000,reduced_tail:bool=False) -> MobileNetV3:
width_multi = 1.0
bneck_conf = partial(InvertedResidualConfig,width_multi=width_multi)
adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_multi=width_multi)
reduce_divider = 2 if reduced_tail else 1
inverted_residual_setting = [
# input_c, kernel, expanded_c, out_c, use_se, activation, stride
bneck_conf(16,3,16,16,True,"RE",2), # C1
bneck_conf(16,3,72,24,False,"RE",2), # C2
bneck_conf(24,3,88,24,False,"RE",1),
bneck_conf(24,5,96,40,True,"HS",2), # C3
bneck_conf(40,5,240,40,True,"HS",1),
bneck_conf(40,5,240,40,True,"HS",1),
bneck_conf(40,5,120,48,True,"HS",1),
bneck_conf(48,5,144,48,True,"HS",1),
bneck_conf(48,5,288,96//reduce_divider,True,"HS",2), # C4
bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1),
bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1)]
last_channel = adjust_channels(1024//reduce_divider) # C5
return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)
4、SE-Net
SE-Net于2017年由WMW团队提出。其想法是考虑特征通道之间的关系来提升网络性能,主要包含两个重要的操作Squeeze和Excitation。
一个SE模块如上图所示。
(1)Squeeze操作:
首先顺着空间维度来进行特征压缩,将每个二维的特征通道变成一个实数,这个实数某种程度上具有全局的感受野,并且输出的维度和输入的特征通道数相匹配。它表征着在特征通道上响应的全局分布,而且使得靠近输入的层也可以获得全局的感受野
(2)Excitation操作:
类似于循环神经网络中门的机制。通过参数来为每个特征通道生成权重,其中参数被学习用来显式地建模特征通道间的相关性。
(3)Reweight操作:
将Excitation的输出的权重看做是进过特征选择后的每个特征通道的重要性,然后通过乘法逐通道加权到先前的特征上,完成在通道维度上的对原始特征的重标定。
同时,SE模块可以嵌入到现在几乎所有的网络结构中。通过在原始网络结构的building block 单元中嵌入SE模块,我们可以获得不同种类的SENet。
下图左为将SE模块嵌入到Inception结构的一个示例。方框旁边的维度信息代表该层的输出。
右为将SE模块嵌入到含有skip-connections 的模块中
总结:SE模块就是先通过提取每个channel的信息进行squeeze(一般通过全局平均池化来进行),然后通过两个Fully Connected 层组成一个Bottleneck结构提取channel间的相关性,再将获得的权重用乘法乘法逐通道加权到先前的特征上,实现对原特征图的重标定。
Part 2 代码练习
HybridSN 高光谱分类
高光谱图像是三维立体数据,包含两个空间维度和一个光谱维度,空间维度与一般的RGB图像或灰度图像相似,而光谱维度是由于物体反射、吸收、透射和辐射电磁波的能力不同,因此使用不同波长的电磁波去照射地表物体,便可以根据物体反射的电磁波信息区分地表物质(不同物质反射电磁波的能力不同)→使用数量众多的光波段去探测地表物质,便可以获得丰富的光谱信息,将获得的信息在图像的第三个维度堆叠起来→高光谱立体数据。
在HybridSN之前,对高光谱图像有分别使用二维卷积和三维卷积。
而HybridSN是将二维和三维卷积结合起来使用先使用三维卷积,再堆叠二维卷积,最后连接分类器。这样既发挥了三维卷积的优势,充分提取光谱-空间特征,也避免了完全使用三维卷积而导致的模型复杂。
HybridSN的网络结构:
下面是补充完的网络结构
#定义网络结构
class_num = 16
class HybridSN(nn.Module):
def __init__(self,num_classes=16):
super(HybridSN,self).__init__()
#先定义3d卷积,默认strdie为1,padding为0
self.conv1 = nn.Conv3d(1,8,(7,3,3))
self.bn1=nn.BatchNorm3d(8)#batchnormalization
self.conv2 = nn.Conv3d(8,16,(5,3,3))
self.bn2=nn.BatchNorm3d(16)
self.conv3 = nn.Conv3d(16,32,(3,3,3))
self.bn3=nn.BatchNorm3d(32)
#再定义2d卷积,默认strdie为1,padding为0
self.conv4 = nn.Conv2d(576,64,(3,3))
self.bn4=nn.BatchNorm2d(64)
self.drop = nn.Dropout(p=0.4)#两个fc层都用比例为0.4的 Dropout,防止过拟合
self.fc1 = nn.Linear(18496,256)#经flatten后变为18496维的向量
self.fc2 = nn.Linear(256,128)
self.fc3 = nn.Linear(128,num_classes)
self.relu = nn.ReLU()
def forward(self,x):
out = self.relu(self.bn1(self.conv1(x)))
out = self.relu(self.bn2(self.conv2(out)))
out = self.relu(self.bn3(self.conv3(out)))
out = out.reshape(out.shape[0], 576, 19, 19)
out = self.relu(self.bn4(self.conv4(out)))
out = out.view(out.size(0),-1)
out = self.fc1(out)
out = self.drop(out)
out = self.relu(out)
out = self.fc2(out)
out = self.drop(out)
out = self.relu(out)
out = self.fc3(out)
return out
#随机输入,测试网络结构是否通
x = torch.randn(1,1,30,25,25)
net = HybridSN()
y = net(x)
print(y.shape)
训练结果:
测试结果:
准确率为97.89%
可视化结果:
多次测试的准确率分别为:97.84%、97.82%、97.73%
且可以看到在测试时,每次分类的结果也不同
Part 3 问题回答
分类结果不同的主要原因在于在训练时使用了dropout,它会使神经元随机失活,以避免过拟合。在每次测试时随机失活的卷积核不同,导致了不同的分类结果。
一般使用了dropout时,要使用model.train()和model.eval()来在训练和测试时使dropout打开和关闭。只有当模型中存在dropout和batchnorm的时候用model.train()&model.eval才有区别。因为这里我们的目的是测试训练好的网络,而不是在训练网络,没有必要再dropout和再计算BN的方差和均值(BN使用训练的历史值)。
使用model.eval之后,分类结果:
且,多次测试结果不再改变。
考虑使用SE模块,将光谱维度的信息与空间维度结合起来
#引入SE模块
class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool3d(1) #全局平均池化,输入BCHW -> 输出 B*C*1*1
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False), #可以看到channel得被reduction整除,否则可能出问题
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _, _ = x.size()
y = self.avg_pool(x).view(b, c) #得到B*C*1*1,然后转成B*C,才能送入到FC层中。
y = self.fc(y).view(b, c, 1, 1, 1) #得到B*C的向量,C个值就表示C个通道的权重。把B*C变为B*C*1*1是为了与四维的x运算。
return x * y.expand_as(x) #先把B*C*1*1变成B*C*H*W大小,其中每个通道上的H*W个值都相等。*表示对应位置相乘。
class_num = 16
class HybridSN(nn.Module):
def __init__(self,num_classes=16):
super(HybridSN,self).__init__()
#先定义3d卷积,默认strdie为1,padding为0
self.conv1 = nn.Conv3d(1,8,(7,3,3))
self.bn1=nn.BatchNorm3d(8)#batchnormalization
self.conv2 = nn.Conv3d(8,16,(5,3,3))
self.bn2=nn.BatchNorm3d(16)
self.conv3 = nn.Conv3d(16,32,(3,3,3))
self.bn3=nn.BatchNorm3d(32)
#再定义2d卷积,默认strdie为1,padding为0
self.conv4 = nn.Conv2d(576,64,(3,3))
self.bn4=nn.BatchNorm2d(64)
self.drop = nn.Dropout(p=0.4)#两个fc层都用比例为0.4的 Dropout,防止过拟合
self.fc1 = nn.Linear(18496,256)#经flatten后变为18496维的向量
self.fc2 = nn.Linear(256,128)
self.fc3 = nn.Linear(128,num_classes)
self.relu = nn.ReLU()
self.se = SELayer(32,16)
# self.softmax = nn.Softmax(dim=1)
def forward(self,x):
out = self.relu(self.bn1(self.conv1(x)))
out = self.relu(self.bn2(self.conv2(out)))
out = self.relu(self.bn3(self.conv3(out)))
out = self.se(out)
out = out.reshape(out.shape[0], 576, 19, 19)
out = self.relu(self.bn4(self.conv4(out)))
out = out.view(out.size(0),-1)
out = self.fc1(out)
out = self.drop(out)
out = self.relu(out)
out = self.fc2(out)
out = self.drop(out)
out = self.relu(out)
out = self.fc3(out)
# out = self.softmax(out)
return out
#随机输入,测试网络结构是否通
p = torch.randn(1,1,30,25,25)
net = HybridSN()
t = net(p)
print(t.shape)
相比于原HybridSN主要是在第三次三维卷积之后,reshape之前进行了SE操作,不过结果好像并没有变好
Part 4 其他
1、遇到的问题
一开始在进行HybridSN的训练时没有加batch normalization,训练时比加了稍微慢些,且其最后平均loss仍为3.多,在测试集上的准确率也仅为95%多,比加了之后的准确率要低些,可以看出batch 的效果。
在搭建MobileNet的时候,源码将block = InvertedResidual打成了block = InsertedResidual,且mode='fan_out'打成了mode='fan out'会导致报错。(不确定,因为没有用数据集跑通,主要是因为原文用的ImageNet和COCO好像需要下载到本地才能加载导致最后加载数据集没能加载好,但是在将网络放到gpu时这样会报错)
在将se模块引入HybridSN中,要注意维度变化,否则会报错。