文章目录
这篇博客主要记录自己在学习过程中用到的一些卷积结构,方便之后回顾使用。
Bottleneck(残差模块的常见实现)
残差模块是视觉特征提取网络中一种非常经典和有用的卷积结构,最早由RESNet提出,并一直被广泛应用。其原理如下:
一个残差块 (shortcut connections/skip connections) 分为直接映射部分(xl) 和残差部分F(xl+wl) ,可以表示为: xl+1 = xl + F(xl , Wl)。
原理
F是求和前网络映射,H是从输入到求和后的网络映射。比如把5映射到5.1,那么引入残差前是F’(5)=5.1,引入残差后是H(5)=5.1, H(5)=F(5)+5, F(5)=0.1。这里的F’和F都表示网络参数映射,引入残差后的映射对输出的变化更敏感。比如s输出从5.1变到5.2,映射F’的输出增加了1/51=2%,而对于残差结构输出从5.1到5.2,映射F是从0.1到0.2,增加了100%。明显后者输出变化对权重的调整作用更大,所以效果更好。残差的思想都是去掉相同的主体部分,从而突出微小的变化.
具体实现
上图为2层的残差模块和3层的残差模块,一般使用的是3层的残差模块。
在3层残差模块中,通常先使用11的卷积降低特征图的通道数,然后使用33的卷积进行特征提取,再使用1*1的卷积
import torch
import torch.nn as nn
class Bottleneck(nn.Module):
expansion = 4 # For Bottleneck, expansion is 4
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(Bottleneck, self).__init__()
# 1x1 Conv
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
# 3x3 Conv
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
# 1x1 Conv
self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False)
self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
# Example usage:
# Define a downsample layer for when in_channels != out_channels * expansion or stride != 1
downsample = nn.Sequential(
nn.Conv2d(64, 256, kernel_size=1, stride=2, bias=False),
nn.BatchNorm2d(256),
)
# Instantiate a Bottleneck block
bottleneck_block = Bottleneck(in_channels=64, out_channels=64, stride=2, downsample=downsample)
# Test with a random tensor input
x = torch.randn(1, 64, 56, 56) # Example input with batch size 1 and 64 channels
output = bottleneck_block(x)
print(output.shape) # Should be [1, 256, 28, 28] due to stride 2 and expansion
Inverted Residual Block(倒置残差模块)
倒置残差块(Inverted Residual Block)是一种轻量级卷积神经网络的设计模块,首次出现在MobileNetV2中。其转置主要是相对普通的残差模块而言的,其进行特征的升维而不是降维,主要由下三步组成
- 首先通过点卷积将输入的通道数扩展。 (1x1 pointwise convolution (expand)
- 在扩展的高维空间中进行处理。(3x3 depthwise convolution)
- 最后通过点卷积将通道数减少到与输出相同的数量。( 1x1 pointwise convolution (compress))
从以上三步可以看出,倒置残差模块虽然是对特征维度进行升维,但是其实整体是用一个深度可分离卷积实现的,因此参数量和计算量也不是很大,但是仍能对含有较多信息量的图像特征进行计算。
具体实现
import torch
import torch.nn as nn
class InvertedResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride, expand_ratio):
super(InvertedResidualBlock, self).__init__()
self.stride = stride
self.use_res_connect = self.stride == 1 and in_channels == out_channels
hidden_dim = in_channels * expand_ratio
self.conv = nn.Sequential(
# 1x1 pointwise convolution (expand)
nn.Conv2d(in_channels, hidden_dim, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# 3x3 depthwise convolution
nn.Conv2d(hidden_dim, hidden_dim, kernel_size=3, stride=stride, padding=1, groups=hidden_dim, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# 1x1 pointwise convolution (compress)
nn.Conv2d(hidden_dim, out_channels, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(out_channels),
)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
# Example usage
if __name__ == "__main__":
# Create an example tensor with batch size 1, 32 input channels, and 224x224 spatial dimensions
input_tensor = torch.randn(1, 32, 224, 224)
# Define an inverted residual block
inverted_res_block = InvertedResidualBlock(in_channels=32, out_channels=64, stride=1, expand_ratio=6)
# Forward pass
output_tensor = inverted_res_block(input_tensor)
print(output_tensor.shape)
ShuffleResidual_1 (shufflenet1 网络的主要单元 )
shufflenet 系列的目的就是减轻模型参数量的同时保证模型精度,是往实时移动端发展的模型。shufflenetV1使用 group convolution还有channel shuffle,保证网络准确率的同时,大幅度降低了所需的计算资源.主要创新点是提出了一种 channel shuffle 的 方法, 能够在一定程度上弥补分组卷积带来的不同组内特征无法交互的影响。
ShuffleNetUnit_1单元的结构就是将1*1的组卷积和通道 shuffle l连接起来用,结构图如下
a)是MobileNet系列网络中的DWconv(即Deep wish conv)。(b)和(c)是本结构中提出的shuffle unit,可以看出与之前的结构基本一致,只是为了进一步减少参数量将1X1卷积优化成1X1组卷积,而且添加channel shuffle来确保不同组之间的信息交互。(c)是步幅等于2的情况,输出特征图尺寸减半,channel维度增加为原先的2倍,为了保证最后的concat连接,需要保证两个分支的输出特征图尺寸相同,因此,在捷径分支上添加步幅为2的3X3全局池化
具体实现
import torch
import torch.nn as nn
import torch.nn.functional as F
class ChannelShuffle(nn.Module):
def __init__(self, groups):
super(ChannelShuffle, self).__init__()
self.groups = groups
def forward(self, x):
batch_size, num_channels, height, width = x.size()
channels_per_group = num_channels // self.groups
# reshape
x = x.view(batch_size, self.groups, channels_per_group, height, width)
x = torch.transpose(x, 1, 2).contiguous()
# flatten
x = x.view(batch_size, -1, height, width)
return x
class ShuffleResidual(nn.Module):
def __init__(self, in_channels, out_channels, stride, groups):
super(ShuffleNetUnit, self).__init__()
self.stride = stride
mid_channels = out_channels // 4
if stride == 2:
out_channels -= in_channels
self.groups = groups
self.bottleneck = nn.Sequential(
nn.Conv2d(in_channels, mid_channels, 1, 1, 0, bias=False, groups=self.groups),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
ChannelShuffle(self.groups),
nn.Conv2d(mid_channels, mid_channels, 3, stride, 1, bias=False, groups=mid_channels),
nn.BatchNorm2d(mid_channels),
nn.Conv2d(mid_channels, out_channels, 1, 1, 0, bias=False, groups=self.groups),
nn.BatchNorm2d(out_channels),
)
self.relu = nn.ReLU(inplace=True)
if stride == 2:
self.avgpool = nn.AvgPool2d(3, stride, 1)
def forward(self, x):
residual = x
out = self.bottleneck(x)
if self.stride == 2:
residual = self.avgpool(residual)
out = torch.cat((out, residual), 1)
else:
out = out + residual
return self.relu(out)
ShuffleResidual_2(shuffenet2网络的主要结构)
shufflenet2 是在v1版本上的改进,主要是通过实验发现了四个原则,即
其中,第一个原则指出,卷积过程中特征的输入和输出通道数相同时,消耗最小的内存损失,可以以相同的时间处理更多的图片。
第二个原则指出,组卷积相较于普通的卷积会增加内存损耗。
第三个则是说,当网络设计的碎片化程度越高时,推理速度越慢,这里碎片化的程度可以理解为分支的程度。这个分支可以是串联也可以是并联。虽然碎片化的结构可以提升准确率,但是会降低模型的效率。如下图所示
b,c,d,e都会降低损耗速度。
第四个原则指出Element-wise 操作带来的影响不可忽视。Element-wise 操作包括激活函数,元素加法 (残差结构),卷积中的 bias。
其实2.3.4原则很显而易见,但是网络提出者指出,这些问题影响比想象中造成的要大。
根据以上准则,作者提出了四点改进,即
- 使用 “平衡” 的卷积,即尽可能让输出输出 channel 的比值为 1
- 注意 Group Conv 的计算成本,并不能一昧地增大
- Group 数量 降低网络的碎片成程度
- 尽可能减少 Element-wise 操作
因此,对shffulenetv1结构做出如下图所示的以下改进:
其中(a)、(b)为v1中的结构,(c)、(d)为v2中的结构。将ac对比,bd对比,就会发现改进之处在于将channel shuffle 放到了最后,取消了11的组卷积变为普通的11卷积。
具体实现
class ShuffleResidual_2(nn.Module):
def __init__(self, in_channels, out_channels, stride):
super(ShuffleNetV2Block, self).__init__()
self.stride = stride
assert stride in [1, 2]
mid_channels = out_channels // 2
if stride == 1:
assert in_channels == out_channels
# Branch 1
if stride == 2:
self.branch1 = nn.Sequential(
nn.Conv2d(in_channels, in_channels, 3, stride=2, padding=1, groups=in_channels, bias=False),
nn.BatchNorm2d(in_channels),
nn.Conv2d(in_channels, mid_channels, 1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
)
else:
self.branch1 = nn.Sequential()
# Branch 2
self.branch2 = nn.Sequential(
nn.Conv2d(in_channels if stride == 2 else mid_channels, mid_channels, 1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
nn.Conv2d(mid_channels, mid_channels, 3, stride=stride, padding=1, groups=mid_channels, bias=False),
nn.BatchNorm2d(mid_channels),
nn.Conv2d(mid_channels, mid_channels, 1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
)
def forward(self, x):
if self.stride == 1:
x1, x2 = self.channel_split(x)
out = torch.cat((x1, self.branch2(x2)), 1)
elif self.stride == 2:
out = torch.cat((self.branch1(x), self.branch2(x)), 1)
out = self.channel_shuffle(out, 2)
return out
@staticmethod
def channel_split(x):
batch_size, num_channels, height, width = x.size()
split_channels = num_channels // 2
return x[:, :split_channels, :, :], x[:, split_channels:, :, :]
@staticmethod
def channel_shuffle(x, groups):
batch_size, num_channels, height, width = x.size()
channels_per_group = num_channels // groups
x = x.view(batch_size, groups, channels_per_group, height, width)
x = torch.transpose(x, 1, 2).contiguous()
x = x.view(batch_size, -1, height, width)
return x