1.ResNet网络是什么
ResNet(Residual Network)是一种深度神经网络模型,也被称为残差网络。是由何凯明在2015年的论文《Deep Residual Learning for Image Recognition》中首次提出了ResNet网络结构,它通过引入残差块(Residual Building Block)来解决深层神经网络训练过程中的梯度消失问题。
在ResNet中,网络的输出由两部分组成:恒等映射(identity mapping)和残差映射(residual mapping)。恒等映射指的是将输入直接传递到下一层,而残差映射则是对输入进行一些非线性变换后再进行传递。这种设计使得网络能够更好地学习残差信息,从而让网络变得更加深层。
ResNet的关键创新点在于引入了shortcut connections,即跳过一层或多层的连接。这些连接使得信息能够更加顺畅地传递,避免梯度在传播过程中消失。通过这种方式,ResNet可以训练非常深的网络,而不会出现性能下降的问题。
2.为什么是ResNet网络
1 为什么要构建深层网络?
答:认为神经网络的每一层分别对应于提取不同层次的特征信息,有低层,中层和高层,而网络越深的时候,提取到的不同层次的信息会越多,而不同层次间的层次信息的组合也会越多。
2 ResNets为什么能构建如此深的网络?
答:深度学习对于网络深度遇到的主要问题是梯度消失和梯度爆炸,传统对应的解决方案则是数据的初始化(normlized initializatiton)和(batch normlization)正则化,但是这样虽然解决了梯度的问题,深度加深了,却带来了另外的问题,就是网络性能的退化问题,深度加深了,错误率却上升了,而残差用来设计解决退化问题,其同时也解决了梯度问题,更使得网络的性能也提升了。
图中可以看出错误率在20层时候是最低的,添加到了56层反而更高了。可能会有小伙伴说是不是过拟合了?其实可以看出来,如果是过你过拟合的话,左侧的训练接在56层时候的错误率依然上升,所以并不是过拟合产生的该情况,是由于神经网络在反向传播过程中通过链式法则不断地反向传播更新梯度,而当网络层数加深时,梯度在传播过程中会逐渐消失也就说我们所说的梯度弥散。这将导致无法对前面网络层的权重进行有效的调整,网络层数越深,训练误差越高,导致训练和测试效果变差,这一现象称为退化。那么,理论上本应该层次更深、效果更好的神经网络,实验结果反而不好,该怎么解决这个问题呢?很多学者都为此感到头疼,幸好RestNet姗姗赶来。
3.ResNet网络是什么样的
3.1 残差结构
传统的平原网络如下图所示,这是一个普通的、两层的卷积+激活。
经过两层卷积+一个激活,我们假定它输出为H(x)。与传统的网络结构相比,ResNet增加了“短路”连接(shortcut connection)或称为跳跃连接(skip connection) ,如下图所示:
它添加了一个短路连接到第二层激活函数之前。那么激活函数的输入就由原来的输出H(x)=F(x)变为了H(x)=F(x)+x。在RestNet中,这种输出=输入的操作成为恒等映射。那么,上图中的identity其实功能也是恒等映射。
那么这么做的好处是什么呢?在深度神经网络中,随着训练过程中反向传播权重参数的更新,网络中某些卷积层已经达到最优解了,其实此时这些层的输入输出都是一样的,已经没有训练的必要。但实际训练过程中,我们是很难将权重参数训练为绝对0误差的,但是这种情况已经是最优解了,其实对这些层的训练过程是可以抛弃的,即此时可以设F(x)=0,那么这时的输出为H(x)=x就是最优输出。
在传统平原网络中,即未加入identity之前,如果网络训练已经达到最优解了,那么随着网络继续训练、权重参数的更新,有可能将已经达到最优解的权重参数继续更新为误差更多的值。但随着identity的加入,在达到最优解的时候直接通过F(x)=x,那么权重参数可以达到至少不会比之前训练效果差的目的,并且可以加快网络收敛。
在解决梯度弥散的问题上,其实可以通过如下的公式分析:
上面的公式中,XL现有网络的某个深层的卷积层,表示某个残差的输入层Xl,可以看出在残差网络中,下面的层次残差的块的输出都可以由上面的某一层确定。
3.2 残差块
左边是18,34的残差块,对于50层以上的深度网络模型,何凯明团队还设计了多用1×1的卷积层,减少了网络参数,如下图所示:
3.3 BasicBlock和Bottleneck
BasicBlock
ResNet中使用的一种网络结构,在resnet18和resnet34中使用了BasicBlock:
输入输出通道数均为64,残差基础块中两个3×3卷积层参数量是:
bottleNeck
ResNet-34核心部分均使用3×3卷积层,总层数相对没那么多,对于更深的网络,作者们提出了另一种残差基础块。(在resnet50、resnet101、resnet152使用了Bottlenect构造网络.)
Bottleneck Block中使用了1×1卷积层。如输入通道数为256,1×1卷积层会将通道数先降为64,经过3×3卷积层后,再将通道数升为256。1×1卷积层的优势是在更深的网络中,用较小的参数量处理通道数很大的输入。
在Bottleneck Block中,输入输出通道数均为256,残差基础块中的参数量是:
与BasicBlock比较,使用1×1卷积层,参数量减少了。当然,使用这样的设计,也是因为更深的网络对显存和算力都有更高的要求,在算力有限的情况下,深层网络中的残差基础块应该减少算力消耗。
与BasicBlock
比较,使用1×1卷积层,参数量减少了。当然,使用这样的设计,也是因为更深的网络对显存和算力都有更高的要求,在算力有限的情况下,深层网络中的残差基础块应该减少算力消耗。
3.4 ResNet18和ResNet50
3.5 BasicBlock和Bottleneck的pytorch实现
BasicBlock
class BasicBlock(nn.Module):
expansion: int = 1
def __init__(
self,
inplanes: int,
planes: int,
stride: int = 1,
downsample: Optional[nn.Module] = None,
groups: int = 1,
base_width: int = 64,
dilation: int = 1,
norm_layer: Optional[Callable[..., nn.Module]] = None
) -> None:
super(BasicBlock, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
# Both self.conv1 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x: Tensor) -> Tensor:
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
Bottleneck
class Bottleneck(nn.Module):
# Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)
# while original implementation places the stride at the first 1x1 convolution(self.conv1)
# according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.
# This variant is also known as ResNet V1.5 and improves accuracy according to
# https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.
expansion: int = 4
def __init__(
self,
inplanes: int,
planes: int,
stride: int = 1,
downsample: Optional[nn.Module] = None,
groups: int = 1,
base_width: int = 64,
dilation: int = 1,
norm_layer: Optional[Callable[..., nn.Module]] = None
) -> None:
super(Bottleneck, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
width = int(planes * (base_width / 64.)) * groups
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x: Tensor) -> Tensor:
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
3.6 ResNet50 代码实现
import torch
import torch.nn as nn
# --------------------------------#
# 从torch官方可以下载resnet50的权重
# --------------------------------#
model_urls = {
'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
}
# -----------------------------------------------#
# 此处为定义3*3的卷积,即为指此次卷积的卷积核的大小为3*3
# -----------------------------------------------#
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=dilation, groups=groups, bias=False, dilation=dilation)
# -----------------------------------------------#
# 此处为定义1*1的卷积,即为指此次卷积的卷积核的大小为1*1
# -----------------------------------------------#
def conv1x1(in_planes, out_planes, stride=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
# ----------------------------------#
# 此为resnet50中标准残差结构的定义
# conv3x3以及conv1x1均在该结构中被定义
# ----------------------------------#
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, base_width=64, dilation=1,
norm_layer=None):
super(Bottleneck, self).__init__()
# --------------------------------------------#
# 当不指定正则化操作时将会默认进行二维的数据归一化操作
# --------------------------------------------#
if norm_layer is None:
norm_layer = nn.BatchNorm2d
# ---------------------------------------------------#
# 根据input的planes确定width,width的值为
# 卷积输出通道以及BatchNorm2d的数值
# 因为在接下来resnet结构构建的过程中给到的planes的数值不相同
# ---------------------------------------------------#
width = int(planes * (base_width / 64.)) * groups
# -----------------------------------------------#
# 当步长的值不为1时,self.conv2 and self.downsample
# 的作用均为对输入进行下采样操作
# 下面为定义了一系列操作,包括卷积,数据归一化以及relu等
# -----------------------------------------------#
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
# --------------------------------------#
# 定义resnet50中的标准残差结构的前向传播函数
# --------------------------------------#
def forward(self, x):
identity = x
# -------------------------------------------------------------------------#
# conv1*1->bn1->relu 先进行一次1*1的卷积之后进行数据归一化操作最后过relu增加非线性因素
# conv3*3->bn2->relu 先进行一次3*3的卷积之后进行数据归一化操作最后过relu增加非线性因素
# conv1*1->bn3 先进行一次1*1的卷积之后进行数据归一化操作
# -------------------------------------------------------------------------#
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
# -----------------------------#
# 若有下采样操作则进行一次下采样操作
# -----------------------------#
if self.downsample is not None:
identity = self.downsample(identity)
# ---------------------------------------------#
# 首先是将两部分进行add操作,最后过relu来增加非线性因素
# concat(堆叠)可以看作是通道数的增加
# add(相加)可以看作是特征图相加,通道数不变
# add可以看作特殊的concat,并且其计算量相对较小
# ---------------------------------------------#
out += identity
out = self.relu(out)
return out
# --------------------------------#
# 此为resnet50网络的定义
# input的大小为224*224
# 初始化函数中的block即为上面定义的
# 标准残差结构--Bottleneck
# --------------------------------#
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=8, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=None,
norm_layer=None):
super(ResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer
self.inplanes = 64
self.dilation = 1
# ---------------------------------------------------------#
# 使用膨胀率来替代stride,若replace_stride_with_dilation为none
# 则这个列表中的三个值均为False
# ---------------------------------------------------------#
if replace_stride_with_dilation is None:
replace_stride_with_dilation = [False, False, False]
# ----------------------------------------------#
# 若replace_stride_with_dilation这个列表的长度不为3
# 则会有ValueError
# ----------------------------------------------#
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.block = block
self.groups = groups
self.base_width = width_per_group
# -----------------------------------#
# conv1*1->bn1->relu
# 224,224,3 -> 112,112,64
# -----------------------------------#
self.conv1 = nn.Conv2d(64, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
# self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = nn.ReLU(inplace=True)
# ------------------------------------#
# 最大池化只会改变特征图像的高度以及
# 宽度,其通道数并不会发生改变
# 112,112,64 -> 56,56,64
# ------------------------------------#
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# 56,56,64 -> 56,56,256
self.layer1 = self._make_layer(block, 64, layers[0])
# 56,56,256 -> 28,28,512
self.layer2 = self._make_layer(block, 128, layers[1], stride=2, dilate=replace_stride_with_dilation[0])
# 28,28,512 -> 14,14,1024
self.layer3 = self._make_layer(block, 256, layers[2], stride=2, dilate=replace_stride_with_dilation[1])
# 14,14,1024 -> 7,7,2048
self.layer4 = self._make_layer(block, 512, layers[3], stride=2, dilate=replace_stride_with_dilation[2])
# --------------------------------------------#
# 自适应的二维平均池化操作,特征图像的高和宽的值均变为1
# 并且特征图像的通道数将不会发生改变
# 7,7,2048 -> 1,1,2048
# --------------------------------------------#
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
# ----------------------------------------#
# 将目前的特征通道数变成所要求的特征通道数(1000)
# 2048 -> num_classes
# ----------------------------------------#
self.fc = nn.Linear(512 * block.expansion, num_classes)
# -------------------------------#
# 部分权重的初始化操作
# -------------------------------#
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
# -------------------------------#
# 部分权重的初始化操作
# -------------------------------#
if zero_init_residual:
for m in self.modules():
if isinstance(m, Bottleneck):
nn.init.constant_(m.bn3.weight, 0)
# --------------------------------------#
# _make_layer这个函数的定义其可以在类的
# 初始化函数中被调用
# block即为上面定义的标准残差结构--Bottleneck
# --------------------------------------#
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
# -----------------------------------#
# 在函数的定义中dilate的值为False
# 所以说下面的语句将直接跳过
# -----------------------------------#
if dilate:
self.dilation *= stride
stride = 1
# -----------------------------------------------------------#
# 如果stride!=1或者self.inplanes != planes * block.expansion
# 则downsample将有一次1*1的conv以及一次BatchNorm2d
# -----------------------------------------------------------#
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
norm_layer(planes * block.expansion),
)
# -----------------------------------------------#
# 首先定义一个layers,其为一个列表
# 卷积块的定义,每一个卷积块可以理解为一个Bottleneck的使用
# -----------------------------------------------#
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
self.base_width, previous_dilation, norm_layer))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
# identity_block
layers.append(block(self.inplanes, planes, groups=self.groups,
base_width=self.base_width, dilation=self.dilation,
norm_layer=norm_layer))
return nn.Sequential(*layers)
# ------------------------------#
# resnet50的前向传播函数
# ------------------------------#
def forward(self, x):
x = self.conv1(x)
# print("conv1",x.shape)
x = self.bn1(x)
# print("bn1", x.shape)
x = self.relu(x)
# print("relu1", x.shape)
# x = self.maxpool(x)
# print("maxpool", x.shape)
x1 = self.layer1(x)
x2 = self.layer2(x1)
x3 = self.layer3(x2)
x4 = self.layer4(x3)
# x = self.avgpool(x)
# --------------------------------------#
# 按照x的第1个维度拼接(按照列来拼接,横向拼接)
# 拼接之后,张量的shape为(batch_size,2048)
# --------------------------------------#
# x = torch.flatten(x, 1)
# --------------------------------------#
# 过全连接层来调整特征通道数
# (batch_size,2048)->(batch_size,1000)
# --------------------------------------#
# x = self.fc(x)
x = [x1,x2,x3,x4]
return x
# F = torch.randn(5, 64, 56, 56)
# print("As begin,shape:", format(F.shape))
# resnet = ResNet(Bottleneck, [3, 4, 6, 3])
# # resnet = ResNet()
# F = resnet(F)
# print(F[0].shape)
# print(F[1].shape)
# print(F[2].shape)
# print(F[3].shape)
参考: