参考:
【1】https://www.bilibili.com/video/BV1rX4y1N7tE/?spm_id_from=333.788&vd_source=9e9b4b6471a6e98c3e756ce7f41eb134
【2】https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/blob/master/pytorch_classification/Test5_resnet/model.py
TOC
1 ResNet基本结构和注意的地方
1.1 模型结构示意图
基本的结构块为conv->bn->relu
,且假设输入的图像为(224,224,3)
,以下为Flow的细节:
1) 经过layer1 : 卷积的参数为(kernel_size = 7, stride =2, padding=3
) ,则输出的图像大小为
224
−
7
+
6
2
+
1
=
112
(向下取整)
\frac{224-7+6}{2}+1 = 112(向下取整)
2224−7+6+1=112(向下取整)。然后经过Maxpool2d,高宽减半,既变成(56,56,64)
2) 经过layer2(对于resnet18/34而言): 此处残差块参数为(kernel_size=3,stride=1,padding=1
),既高宽不变,维度不变,既输出的特征变为(56,56,64)
3)经过layer2(对于resnet50/101/152):这时输出的维度变成了256,若要做残差连接,在第一层必须有一个1x1的卷积进行升维处理,但是高宽保持不变,故该残差path上的1x1的卷积的步长为1,padding为1.
4)经过layer3/4/5:每一个第一层都需要改变输入的维度和高宽,对于resnet18/34,则在第一个3x3conv上将步长设置为2;对于resnet50/101/154,则将3x3conv的步长设置为2;并且将残差path上的1x1conv的步长也设置为2,如图所示
1.2 参数量
1.3 残差结构为什么有用
这里只是做一个简单的数学推理:假设一个简单的residual block为fn+bn->relu->fn+bn->residual
,设置这一整个block为函数
f
(
x
,
w
)
f(x,w)
f(x,w),则有输出公式:
y
=
x
+
f
(
x
,
w
)
y=x+f(x,w)
y=x+f(x,w)
则
∂
y
∂
x
=
I
+
∂
f
(
x
,
w
)
∂
x
\frac{\partial y}{\partial x} = I +\frac{\partial f(x,w)}{\partial x}
∂x∂y=I+∂x∂f(x,w), 而
∂
l
∂
x
=
∂
y
∂
x
∂
l
∂
y
=
(
I
+
∂
f
(
x
,
w
)
∂
x
)
∂
l
∂
y
\frac{\partial l}{\partial x} =\frac{\partial y}{\partial x}\frac{\partial l}{\partial y}= ( I +\frac{\partial f(x,w)}{\partial x})\frac{\partial l}{\partial y}
∂x∂l=∂x∂y∂y∂l=(I+∂x∂f(x,w))∂y∂l
这样每一次传导梯度的时候就避免了过小值,而优化器SGD的更新和梯度是有很大关系的。
2 ResNext相对于ResNet的改进
2.1 Group Convolution
参考:https://blog.csdn.net/caip12999203000/article/details/126693895
可见组卷积的参数量是普通卷积的
1
g
\frac{1}{g}
g1倍,则起到了相当于正则的作用;
缺点就是各个组之间没有互相通信。
2.2 Block的介绍
- 以上三种结构是等价的,所以可以用最后的©去替换掉原来的residual block
- C表示组卷积的group数量,4表示group 卷积的个数
- 总体来看,ResNext和ResNet大致是一样的,不同之处在于将residual block的前两个conv的输出维度提高了一倍,并且中间的3x3conv替换成了group=32且group_num=4的组卷积
2.3 注意
只有在block的深度>=3时,group conv才有意义,所以一般都是ResNet50及以上才会去改进为ResNext
3 ResNet和ResNext手敲代码
3.1 ResNet部分
3.1.1 Basic Block
这个Block的规律如下:
- 第一个和第二个卷积的kernel size均为3,padding均为1,且只有两个卷积
- 如果没有残差1x1 conv,则第一个卷积的stride为1;如果有残差1x1 conv,则第一个卷积的stride为2;第二个卷积的stride均为1;
- 输出维度均为out_channels,此处设置变量expansion=1
- 设置downsample判断是否需要下采样
- conv不要bias,因为使用了BN层
手敲代码:
import torch
import torch.nn as nn
class BasicBlock(nn.Module):
expansion = 1
def __init__(self,in_channel,out_channel,stride=1,downsample=None,**kwargs):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=in_channel,
out_channels=out_channel,
kernel_size=3,
stride=stride,padding=1,
bias=False)
self.bn1 = nn.BatchNorm2d(out_channel)
self.conv2 = nn.Conv2d(in_channels=out_channel,
out_channels=out_channel,
kernel_size=3,
stride=1, padding=1,
bias=False) # 此处stride设置为1
self.bn2 = nn.BatchNorm2d(out_channel)
self.relu = nn.ReLU()
self.downsample = downsample
def forward(self,x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
x = self.relu(self.bn1(self.conv1(x)))
x = self.bn2(self.conv2(x))
x += identity
out = self.relu(x)
return out
3.2 BottleNeck
这个Block的规律如下:
- 共有三个卷积,kernel_size分别为1、3、1
- 第三个卷积的out_channel,为第1、2个卷积out_channel的4倍,所以设置expansion=4
- 有残差1x1 conv时,第二个卷积的stride为2
手敲代码:
class BottleNeck(nn.Module):
expansion = 4
def __init__(self,in_channel,out_channel,stride=1,downsample=None,**kwargs):
super().__init__()
self.downsample= downsample
self.conv1 = nn.Conv2d(in_channels=in_channel,out_channels=out_channel,kernel_size=1,stride=1,bias=False)
self.bn1 = nn.BatchNorm2d(out_channel)
self.conv2 = nn.Conv2d(in_channels=out_channel,out_channels=out_channel,kernel_size=stride,stride=1,padding=1,bias=False)
self.bn2 = nn.BatchNorm2d(out_channel)
self.conv1 = nn.Conv2d(in_channels=out_channel,out_channels=out_channel*self.expansion,kernel_size=1,stride=1,bias=False)
self.bn1 = nn.BatchNorm2d(out_channel*self.expansion)
self.relu = nn.ReLU(inplace=True)
def forward(self,x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
x = self.relu(self.bn1(self.conv1(x)))
x = self.relu(self.bn2(self.conv2(x)))
x = self.bn3(self.conv3(x))
x += identity
return self.relu(x)
默写写错的地方
- 第一个和第三个conv是没有padding的,第二个conv padding=1!!
- conv只能接收参数groups,不接收参数width_per_groups,用来计算width的
3.3 ResNet
对于Block的创建的小总结:
- layer1 都是相同的,使用7x7 conv(k=7,s=2,p=3),然后使用maxpool层
- 对于layer2,resnet18/34和resnet50/101/154不一样,通过判断第一层的out_channel和最后一层的out_channel即可知道(比如resnet18,out_channel都是64; resnet50,out_channel分别为64和64*4=256),所以这是一个是否需要downsample的判断条件。
- 对于layer2,传入的stride为1,因为无论是否有残差结构都为stride=1
- 对于layer3、4、5,传入的stride为2,判断是否需要downsample的第二个判断条件就是stride是否为1,如果不是,则第一个block是需要downsample的。
- 对于residual的1x1 conv,out_channel为第一层的out_channel乘以expansion
__init__函数传入参数介绍:
block
: resnet18/34的BasicBlock 或 resnet50/101/154的BottleNeckblocks_num
:为列表类型,表示每一层的block重复多少次,比如resnet50为[3,4,6,3]num_classes
: 类别数,分类头用include_top
: 是否使用分类头,既layer5之后的部分
make_layer函数传入参数介绍以及函数说明:
block
: 注意block包含的参数为in_c; out_c; stride; downsample;channel
: 既第一个卷积的out_channelblock_num
: 上述stride
: 步长- 设置downsample为None,并根据判断条件1) 步长是否为1 ; 2)第一个卷积的out_channel(也就是
channel
)是否等于第三个卷积的out_channel(也就是self.in_channel * self.expansion
),如果满足条件之一,则有downsample。第一个条件是针对layer3/4/5,第二个条件是针对layer2。 - downsample为1x1 conv(
k=1, s=stride, padding=1, out_channel = self.in_channel*self.expansion
) - 当完成第一个block之后,循环遍历其他block,这时候是没有downsample的,直接传入block即可
手敲代码:
class ResNet(nn.Module):
def __init__(self,
block,
blocks_num,
num_classes=1000,
include_top=True):
super().__init__()
self.include_top = include_top
self.in_channel = 64
self.conv1 = nn.Conv2d(3,self.in_channel,kernel_size=7,stride=2,padding=3,bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.maxpool = nn.MaxPool2d(kernel_size=3,stride=2,padding=1)
self.layer1 = self.make_layer(block,64,blocks_num[0])
self.layer2 = self.make_layer(block,128,blocks_num[1],stride=2)
self.layer3 = self.make_layer(block,256,blocks_num[2],stride=2)
self.layer4 = self.make_layer(block,512,blocks_num[3],stride=2)
if self.include_top :
self.avgpool = nn.AdaptiveAvgpool2d((1,1))
self.fc = nn.Linear(512* block.expansion,num_classes)
for m in self.modules():
if isinstance(m,nn.Conv2d):
nn.init.kaiming_normal_(m.weight,mode='fan_out',nonlinearity='relu')
def make_layer(block,channel,block_num,stride=1):
downsample = None
if stride!=1 or self.in_channel != channel*block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel,channel*block.expansion,kernel_size=1,stride=stride)
nn.BatchNorm2d(channel*block.expansion))
layers = []
layers.append(block(self.in_channel,channel*block.expansion,stride=stride,downsample=downsample))
self.in_channel = channel*block.expansion
for _ in range(1,block_num):
layers.append(block(self.in_channel,channel*block.expansion))
return nn.Sequential(*layers)
测试代码:
def resnet34(num_classes=1000,include_top=True):
return ResNet(Basic,[3,4,6,3],num_classes=num_classes,include_top=include_top)
model = resnet34(3)
x = torch.rand((2,3,224,224))
out = model(x)
print(out)
3.2 ResNext部分
基于ResNet50上需要修改的地方:
- 第一二个卷积上的输出维度翻倍,翻倍用下面的公式
- 第二个卷积换成group conv,width_per_group=32, group=4
- width = int(out_channel *(width_per_group/ 64.))*group ,如果group=1,width_per_group=64,则后面的系数为1;若group=2,width_per_group=32, 则后面的系数为2,则width为out_channel的两倍
- 只用修改BottleNeck以及make_layers函数!
修改的代码:
- BottleNeck部分
class BottleNeck(nn.Module):
expansion = 4
def __init__(self,in_channel,out_channel,stride=1,downsample=None,groups=1,width_per_group=64,**kwargs):
# ResNext : 增加width; 修改channel; 第二个卷积增加 group和width_per_group参数
super().__init__()
width = int(out_channel*(width_per_group/64.)) * groups
self.downsample = downsample
self.conv1 = nn.Conv2d(in_channels=in_channel,out_channels=width,kernel_size=1,stride=1,bias=False)
self.bn1 = nn.BatchNorm2d(width)
self.conv2 = nn.Conv2d(in_channels=width,out_channels=width,kernel_size=3,padding=1,stride=stride,bias=False,groups=groups)
self.bn2 = nn.BatchNorm2d(width)
self.conv3 = nn.Conv2d(in_channels=width,out_channels=out_channel*self.expansion,kernel_size=1,stride=1,bias=False)
self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
self.relu = nn.ReLU(inplace=True)
def forward(self,x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
x = self.relu(self.bn1(self.conv1(x)))
x = self.relu(self.bn2(self.conv2(x)))
x = self.bn3(self.conv3(x))
x += identity
return self.relu(x)
- ResNet部分
class ResNet(nn.Module):
def __init__(self,block,blocks_num,num_classes=1000,include_top=True,groups=1,width_per_group=64):
- make_layer部分
def make_layer(self,block,channel,block_num,stride=1):
downsample = None
if stride != 1 or self.in_channel != channel*block.expansion:
downsample = nn.Sequential(
nn.Conv2d(in_channels = self.in_channel, out_channels = block.expansion*channel, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(block.expansion* channel)
)
layers = []
layers.append(block(self.in_channel,
channel,
downsample=downsample,
stride=stride,
groups=self.groups,
))
self.in_channel = channel * block.expansion
for _ in range(1,block_num):
layers.append(block(self.in_channel,
channel,
groups=self.groups,
))
return nn.Sequential(*layers)
- 测试
def resnext50_32x4d(num_classes=1000,include_top=True):
return ResNet(BottleNeck,[3,4,6,4],num_classes,include_top,groups=4,width_per_group=32)
#model = resnet34(num_classes=3)
#model = resnet50(3)
model = resnext50_32x4d(3)
x = torch.rand((64,3,224,224))
out = model(x)
print(out)