首先给出网络设计的完整代码:
import torch.nn as nn
class conv_block(nn.Module):
def __init__(self, in_channel, growth_rate):
super(conv_block, self).__init__()
self.conv = nn.Sequential(
nn.BatchNorm2d(in_channel),
nn.ReLU(),
nn.Conv2d(in_channel, 4*growth_rate, kernel_size=(1, 1), bias=False),
nn.Conv2d(4*growth_rate, growth_rate, kernel_size=(3, 3), padding=1, bias=False)
)
def forward(self, x):
out = self.conv(x)
x = torch.cat([x, out], dim=1)
return x
class transition(nn.Module):
def __init__(self, in_channel, theta=0.5):
super(transition, self).__init__()
self.conv = nn.Sequential(
nn.BatchNorm2d(in_channel),
nn.ReLU(),
nn.Conv2d(in_channel, int(theta*in_channel), kernel_size=(1, 1)),
nn.AvgPool2d(2, 2)
)
def forward(self, x):
return self.conv(x)
class densenet(nn.Module):
def __init__(self, in_channel, classes_num, block_layers, growth_rate=32, theta=0.5):
super(densenet, self).__init__()
channels = 64
self.growth_rate = growth_rate
self.conv1 = nn.Sequential(
nn.Conv2d(in_channel, channels, kernel_size=(7, 7), stride=2, padding=3),
nn.BatchNorm2d(channels),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
self.DB1, channels = self._make_dense_block(channels, num=block_layers[0])
self.TL1 = transition(channels, theta)
channels = int(channels * theta)
self.DB2, channels = self._make_dense_block(channels, num=block_layers[1])
self.TL2 = transition(channels, theta)
channels = int(channels * theta)
self.DB3, channels = self._make_dense_block(channels, num=block_layers[2])
self.TL3 = transition(channels, theta)
channels = int(channels * theta)
self.DB4, channels = self._make_dense_block(channels, num=block_layers[3])
self.global_average_pool = nn.Sequential(
nn.BatchNorm2d(channels),
nn.ReLU(),
nn.AdaptiveAvgPool2d((1, 1))
)
self.fc = nn.Sequential(
nn.Flatten(1, -1),
nn.Linear(channels, classes_num)
)
def forward(self, x):
x = self.conv1(x)
x = self.DB1(x)
x = self.TL1(x)
x = self.DB2(x)
x = self.TL2(x)
x = self.DB3(x)
x = self.TL3(x)
x = self.DB4(x)
x = self.global_average_pool(x)
x = self.fc(x)
return x
def _make_dense_block(self, in_channel, num):
layers = []
channels = in_channel
for i in range(num):
block = conv_block(channels, self.growth_rate)
channels += self.growth_rate
layers.append(block)
return nn.Sequential(*layers), channels
给出生成一个该网络实例的代码:
net = densenet(in_channel=3, classes_num=10, block_layers=[6,12,24,16]
growth_rate=32, theta=0.5)
这里生成了一个DenseNet-121网络,使用的数据集为分类为10类的rgb图像 (通道数为3)
网络的结构如下:
dense block实现
代码中的conv_block即为上图中绿色框的部分,先使用1*1卷积来减少通道数从而减少参数量,论文中使用4*k作为该次卷积的输出通道数(k代表每个conv_block输出的通道数,也就是论文中growth rate,是固定值)接着使用3*3的卷积,通道数为k
densenet的特点在于特征的复用,体现在代码中的conv_block中forward下面的torch.cat([x, out], dim=1) 将本次(记为第i次)的输出(通道数为k)与本次的输入(通道数为 最初输入的通道数 in_channel + (i-1)*k)在通道维度上(dim=1)进行拼接。
**注意区别于resnet, resnet是将输出和输入做加法而这里是将通道进行拼接
将conv_block复用多次即为一个dense _block,表现在上图中是蓝色框住的部分,体现在代码densenet类下的_make_dense_block方法
transition实现
上图中表示为红色框住的部分,代码中即为transition类
首先使用一个1*1卷积作为bottle_neck,作用是减少通道数从而减少参数量,接着用2*2,stride=2的平均池化来缩小特征图尺寸
其中theta是论文中所使用的超参数,用theta*in_channel来表示输出的通道数,为了达到减少参数量的目的,theta取值0-1之间,论文中设为0.5
需要特别注意的是,由于nn.conv2d()中关于通道数的参数必须为整数,但是theta*in_channel为浮点数,需要进行类型转换int()
注意
虽然densenet的参数量相比于同等深度的网络来说更少,但是占用显存更多,有时候cuda会报错,可以适当调小batch_size
感谢 @视觉盛宴 大佬搭建网络的思路!