Pytorch之经典神经网络CNN(九) —— DenseNet(CIFAR-10)

2017年提出  CVPR2017 best paper & oral

 

DenseNet —— Dense Convolutional Network(稠密连接网络)

DenseNet主要还是和ResNet及Inception网络做对比,思想上有借鉴,但却是全新的结构.网络结构并不复杂,却非常有效.

众所周知,最近一两年卷积神经网络提高效果的方向,要么深(比如ResNet,解决了网络深时候的梯度消失问题)要么宽(比如GoogleNet的Inception)而DenseNet则是从feature入手,通过对feature的极致利用达到更好的效果和更少的参数。

DenseNet 和 ResNet 不同在于 ResNet 是跨层求和, 而 DenseNet 是跨层将特征在通道维度进行拼接

DenseNet的实验结果是优于ResNet的

 

DenseNet的几个优点:

  1. 减轻了vanishing-gradient(梯度消失)
  2. 加强了feature的传递
  3. 更有效地利用了feature
  4. 一定程度上较少了参数数量

 

在传统的卷积神经网络中,如果你有L层,那么就会有L个连接,但是在DenseNet中,会有L(L+1)/2个连接。简单讲,就是每一层的输入来自前面所有层的输出。如下图:x0是input,H1的输入是x0(input),H2的输入是x0和x1(x1是H1的输出),H3的输入是x0和x1和x2……

DenseNet的一个优点是网络更窄,参数更少,很大一部分原因得益于这种dense block的设计。在dense block中每个卷积层的输出feature map的数量都很小(小于100),而不是像其他网络一样动不动就几百上千的宽度。同时这种连接方式使得特征和梯度的传递更加有效,网络也就更加容易训练。原文的一句话非常喜欢:Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision.直接解释了为什么这个网络的效果会很好。前面提到过梯度消失问题在网络深度越深的时候越容易出现,原因就是输入信息和梯度信息在很多层之间传递导致的,而现在这种dense connection相当于每一层都直接连接input和loss,因此就可以减轻梯度消失现象,这样更深网络不是问题

 

在这个结构图中包含了3个dense block。作者将DenseNet分成多个dense block,原因是希望各个dense block内的feature map的size统一,这样在做concatenation就不会有size的问题。

 

 

DenseNet的主要构建模块是稠密块(dense block)和过渡层(transition layer)。前者定义了了输入和输出是如何连结的,后者则用来控制通道数,使之不过大。

 

网络模型

DenseNet的主要构建模块是稠密块(dense block)和过渡层(transition layer)。前者定义了了输入和输出是如何连结的,后者则用来控制通道数,使之不过大。

稠密块由多个 conv_block 组成,每块使用相同的输出通道数(每个block的输出通道数相同,输入通道数逐层递增)。但在前向计算时,我们将每个块的输入和之前层的输出在通道维上连结。

前面的层的输出会依次在当前层的输入的channel上叠加。dense block 将每次的卷积的输出(即output_channel)称为 growth_rate , 因为如果输入是 in_channel , 有 n 层, 那么输出就是 in_channel + n * growh_rate

import numpy as np
import torch
from torch import nn
from torch.autograd import Variable
from torchvision.datasets import CIFAR10

#bn层是放在conv层的前面和后面都可以,一般是放在后面,这里放在了前面
def conv_block(in_channel, out_channel):
    layer = nn.Sequential(
        nn.BatchNorm2d(in_channel),
        nn.ReLU(True),
        nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1, bias=False)
    )
    return layer


#稠密块由多个conv_block 组成,每块使⽤用相同的输出通道数。但在前向计算时,我们将每块的输入和输出在通道维上连结。
class dense_block(nn.Module):
    # growth_rate即output_channel
    def __init__(self, in_channel, growth_rate, num_layers):
        super(dense_block, self).__init__()
        block = []
        channel = in_channel
        for i in range(num_layers):
            block.append(
                conv_block(in_channel=channel, out_channel=growth_rate)
            )
            channel += growth_rate
            self.net = nn.Sequential(*block)

    def forward(self, x):
        for layer in self.net:
            out = layer(x)
            x = torch.cat((out, x), dim=1)
        return x


blk = dense_block(in_channel=3, growth_rate=10, num_layers=4)
X = torch.rand(4, 3, 8, 8)
Y = blk(X)
print(Y.shape) # torch.Size([4, 43, 8, 8])

在本例中,我们定义了有4个block,输出通道数为10。使⽤通道数为3的输入时,我们会得到通道数为3+4*10的输出。卷积块的通道数控制了了输出通道数相对于输入通道数的增长,因此也被称为增⻓长率(growth rate)

 

过渡层( transition block), 由于每个dense block都会带来通道数的增加,使⽤用过多则会带来过于复杂的模型。过渡层用来控制模型复杂度。它通过1*1的卷积层来减小通道数,并使用步幅为2的平均池化层减半高和宽,从而进一步降低模型复杂度。

import numpy as np
import torch
from torch import nn
from torch.autograd import Variable
from torchvision.datasets import CIFAR10

def conv_block(in_channel, out_channel):
    layer = nn.Sequential(
        nn.BatchNorm2d(in_channel),
        nn.ReLU(True),
        nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1, bias=False)
    )
    return layer


#稠密块由多个conv_block 组成,每块使⽤用相同的输出通道数。但在前向计算时,我们将每块的输入和输出在通道维上连结。
class dense_block(nn.Module):
    # growth_rate即output_channel
    def __init__(self, in_channel, growth_rate, num_layers):
        super(dense_block, self).__init__()
        block = []
        channel = in_channel
        for i in range(num_layers):
            block.append(
                conv_block(in_channel=channel, out_channel=growth_rate)
            )
            channel += growth_rate
            self.net = nn.Sequential(*block)

    def forward(self, x):
        for layer in self.net:
            out = layer(x)
            x = torch.cat((out, x), dim=1)
        return x


blk = dense_block(in_channel=3, growth_rate=10, num_layers=4)
X = torch.rand(4, 3, 8, 8)
Y = blk(X)
print(Y.shape) # torch.Size([4, 43, 8, 8])


def transition_block(in_channel, out_channel):
    trans_layer = nn.Sequential(
        nn.BatchNorm2d(in_channel),
        nn.ReLU(True),
        nn.Conv2d(in_channel, out_channel, 1),
        nn.AvgPool2d(2, 2)
    )
    return trans_layer


blk = transition_block(in_channel=43, out_channel=10)
print(blk(Y).shape) # torch.Size([4, 10, 4, 4])

对上一个例子中稠密块的输出使⽤用通道数为10的过渡层。此时输出的通道数减为10,高和宽均减半。

 

 

构造DenseNet模型

DenseNet首先使用同ResNet一样的单卷积层和最大池化层。

输入的size是96*96*3,将cifar10的图片resize到96

import numpy as np
import torch
from torch import nn,optim
from torch.autograd import Variable
from torchvision.datasets import CIFAR10
import torchvision
from datetime import datetime

def conv_block(in_channel, out_channel):
    layer = nn.Sequential(
        nn.BatchNorm2d(in_channel),
        nn.ReLU(True),
        nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1, bias=False)
    )
    return layer


#稠密块由多个conv_block 组成,每块使⽤用相同的输出通道数。但在前向计算时,我们将每块的输入和输出在通道维上连结。
class dense_block(nn.Module):
    # growth_rate即output_channel
    def __init__(self, in_channel, growth_rate, num_layers):
        super(dense_block, self).__init__()
        block = []
        channel = in_channel
        for i in range(num_layers):
            block.append(
                conv_block(in_channel=channel, out_channel=growth_rate)
            )
            channel += growth_rate
            self.net = nn.Sequential(*block)

    def forward(self, x):
        for layer in self.net:
            out = layer(x)
            x = torch.cat((out, x), dim=1)
        return x


blk = dense_block(in_channel=3, growth_rate=10, num_layers=4)
X = torch.rand(4, 3, 8, 8)
Y = blk(X)
print(Y.shape) # torch.Size([4, 43, 8, 8])


def transition_block(in_channel, out_channel):
    trans_layer = nn.Sequential(
        nn.BatchNorm2d(in_channel),
        nn.ReLU(True),
        nn.Conv2d(in_channel, out_channel, 1),
        nn.AvgPool2d(2, 2)
    )
    return trans_layer


blk = transition_block(in_channel=43, out_channel=10)
print(blk(Y).shape) # torch.Size([4, 10, 4, 4])


class DenseNet(nn.Module):
    def __init__(self, in_channel, num_classes=10, growth_rate=32, block_layers=[6, 12, 24, 16]):
        super(DenseNet, self).__init__()
        self.block1 = nn.Sequential(
            nn.Conv2d(in_channels=in_channel, out_channels=64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )

        channels = 64
        block = []
        for i, layers in enumerate(block_layers):
            block.append(dense_block(channels, growth_rate, layers))
            channels += layers * growth_rate
            if i != len(block_layers) - 1:
                block.append(transition_block(channels, channels // 2)) # 通过 transition 层将大小减半, 通道数减半
                channels = channels // 2
        self.block2 = nn.Sequential(*block)
        self.block2.add_module('bn', nn.BatchNorm2d(channels))
        self.block2.add_module('relu', nn.ReLU(True))
        self.block2.add_module('avg_pool', nn.AvgPool2d(3))
        self.classifier = nn.Linear(channels, num_classes)

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = x.view(x.shape[0], -1)
        x = self.classifier(x)
        return x


def get_acc(output, label):
    total = output.shape[0]
    # output是概率,每行概率最高的就是预测值
    _, pred_label = output.max(1)
    num_correct = (pred_label == label).sum().item()
    return num_correct / total


batch_size = 32
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(size=96),
    torchvision.transforms.ToTensor()
])

train_set = torchvision.datasets.CIFAR10(
    root='dataset/',
    train=True,
    download=True,
    transform=transform
)

# hand-out留出法划分
train_set, val_set = torch.utils.data.random_split(train_set, [40000, 10000])

test_set = torchvision.datasets.CIFAR10(
    root='dataset/',
    train=False,
    download=True,
    transform=transform
)

train_loader = torch.utils.data.DataLoader(
    dataset=train_set,
    batch_size=batch_size,
    shuffle=True
)
val_loader = torch.utils.data.DataLoader(
    dataset=val_set,
    batch_size=batch_size,
    shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_set,
    batch_size=batch_size,
    shuffle=False
)

net = DenseNet(in_channel=3, num_classes=10)

lr = 1e-2
optimizer = optim.SGD(net.parameters(), lr=lr)
critetion = nn.CrossEntropyLoss()
net = net.to(device)
prev_time = datetime.now()
valid_data = val_loader

for epoch in range(3):
    train_loss = 0
    train_acc = 0
    net.train()

    for inputs, labels in train_loader:
        inputs = inputs.to(device)
        labels = labels.to(device)

        # forward
        outputs = net(inputs)
        loss = critetion(outputs, labels)
        # backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        train_acc += get_acc(outputs, labels)
        # 最后还要求平均的

    # 显示时间
    cur_time = datetime.now()
    h, remainder = divmod((cur_time - prev_time).seconds, 3600)
    m, s = divmod(remainder, 60)
    # time_str = 'Time %02d:%02d:%02d'%(h,m,s)
    time_str = 'Time %02d:%02d:%02d(from %02d/%02d/%02d %02d:%02d:%02d to %02d/%02d/%02d %02d:%02d:%02d)' % (
        h, m, s, prev_time.year, prev_time.month, prev_time.day, prev_time.hour, prev_time.minute, prev_time.second,
        cur_time.year, cur_time.month, cur_time.day, cur_time.hour, cur_time.minute, cur_time.second)
    prev_time = cur_time

    # validation
    with torch.no_grad():
        net.eval()
        valid_loss = 0
        valid_acc = 0
        for inputs, labels in valid_data:
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = net(inputs)
            loss = critetion(outputs, labels)
            valid_loss += loss.item()
            valid_acc += get_acc(outputs, labels)

    print("Epoch %d. Train Loss: %f, Train Acc: %f, Valid Loss: %f, Valid Acc: %f,"
          % (epoch, train_loss / len(train_loader), train_acc / len(train_loader), valid_loss / len(valid_data),
             valid_acc / len(valid_data))
          + time_str)

    torch.save(net.state_dict(), 'checkpoints/params.pkl')

# 测试
with torch.no_grad():
    net.eval()
    correct = 0
    total = 0
    for (images, labels) in test_loader:
        images, labels = images.to(device), labels.to(device)
        output = net(images)
        _, predicted = torch.max(output.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print("The accuracy of total {} val images: {}%".format(total, 100 * correct / total))


 

 

 


DenseNet121,169,201,161

注意我这里实现的其实就是densenet121,我的densenet的参数有一点不同,in_channel不是这里的num_init_features,num_init_features的这个值64已经直接写在网络里了

densenet是第一个block是feature block,是将输入的图片(一般是3 channel)转成 64 channel

 

论文同时提出了DenseNet,DenseNet-B,DenseNet-BC三种结构,具体区别如下:

DenseNet:

Dense Block模块:BN+Relu+Conv(3*3)+dropout

transition layer模块:BN+Relu+Conv(1*1)(filternum:m)+dropout+Pooling(2*2)

DenseNet-B:

Dense Block模块:BN+Relu+Conv(1*1)(filternum:4K)+dropout+BN+Relu+Conv(3*3)+dropout

transition layer模块:BN+Relu+Conv(1*1)(filternum:m)+dropout+Pooling(2*2)

DenseNet-BC:

Dense Block模块:BN+Relu+Conv(1*1)(filternum:4K)+dropout+BN+Relu+Conv(3*3)+dropout

transition layer模块:BN+Relu+Conv(1*1)(filternum:θm,其中0<θ<1,文章取θ=0.5) +dropout +Pooling(2*2)

其中,DenseNet-B在原始DenseNet的基础上,加入Bottleneck layers, 主要是在Dense Block模块中加入了1*1卷积,使得将每一个layer输入的feature map都降为到4k的维度,大大的减少了计算量。

解决在Dense Block模块中每一层layer输入的feature maps随着层数增加而增大,则需要加入Bottleneck 模块,降维feature maps到4k维

DenseNet-BC在DenseNet-B的基础上,在transitionlayer模块中加入了压缩率θ参数,论文中将θ设置为0.5,这样通过1*1卷积,将上一个Dense Block模块的输出feature map维度减少一半。


 

 

  • 4
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值