基于 PyTorch 实现几种不同的卷积神经网络(Alexnet、Vgg、Googlenet、Resnet、densenet)并在CIFAR10数据集上完成分类实验

一、获取CIFAR10数据并批处理


CIFAR-10 是一个常用于机器学习和深度学习研究的图像数据集。其包含 60000 张彩色图像,图像尺寸为 32x32 像素,被分为 10 个不同的类别,分别是飞机、汽车、鸟、猫、鹿、狗、青蛙、马、船和卡车,其中有 50000 张图像用于训练,10000 张图像用于测试。

数据可以通过pytorch直接加载,位于torchvision.datasets中。其加载与批处理过程如下:

train_set = CIFAR10("./C_data", download=True, train=True, transform=data_tf)
test_set = CIFAR10("./C_data", train=False, download=True, transform=data_tf)
train_data = DataLoader(train_set, shuffle=True, batch_size=64)
test_data = DataLoader(test_set, shuffle=False, batch_size=128)

其中data_tf为数据转换函数一般代码如下所示:

def data_tf_C(x):
    x = np.array(x, dtype="float32") / 255
    x = (x - 0.5) / 0.5
    x = x.transpose((2, 0, 1))
    x = torch.from_numpy(x)
    return x

其中  “ x = x.transpose((2, 0, 1))”是为了将数据转换为适合torch.nn处理的数据形式,即一个四维数据,四个维度一次对应批量大小、输入通道数、图像高度和图像宽度。针对不同的需求可以设置为不同的返回值,在后续将会根据各个卷积神经网络设置不同的转换函数。其中“shuffle”参数用于选择是否需要打乱数据,在训练时打乱数据有助于训练,而在测试时我们需要对比试验,所以保持相同的测试顺序。

二、建立基本框架


在实现经典的神经网络之间,我们需要建立训练模型的函数、绘制可视化图像函数等公共组成部分。

首先是获取训练精度的函数,如下:

def get_acc(out, label):
    total = out.shape[0]
    _, pred_label = out.max(1)
    num_correct = (pred_label == label).sum().data
    return num_correct / total

其中,“out”是指训练得到的结果(通常为预测值),“label”则是图像的原始标签值。在训练过程中,数据通过pytorch下的DataLoader将数据以“批”的形式进行遍历,此处的“out”与“label”均是一组数据。通过计算预测值与原始值相同个数占总个数的比例得到准确率。

然后,建立训练函数,如下:

def train_model(net, train_data, epochs, criterion, optimizer, test_data=None):
    if torch.cuda.is_available():
        net = net.cuda()

    losses = []
    acces = []
    test_losses = []
    test_acces = []
    for e in range(epochs):
        start_time = time.time()
        train_loss = 0
        train_acc = 0
        net = net.train()
        idx = 0
        for im, label in train_data:
            if torch.cuda.is_available():
                im = Variable(im.cuda())
                label = Variable(label.cuda())
            else:
                im = Variable(im)
                label = Variable(label)

            out = net(im)
            loss = criterion(out, label)

            net.zero_grad()
            loss.backward()
            optimizer.step()

            train_loss += loss.data
            acc = get_acc(out, label)
            train_acc += acc
            if idx % 100 == 0:
                acces.append(acc)
                losses.append(loss.data)
            idx += 1
        train_time = time.time()
        h, remain = divmod(int((train_time - start_time)), 3600)
        m, s = divmod(remain, 60)
        str = "%02d:%02d:%02d" % (h, m, s)
        if test_data is not None:
            test_loss = 0
            test_acc = 0
            net = net.eval()
            idx = 0
            for im, label in test_data:
                if torch.cuda.is_available():
                    im = Variable(im.cuda())
                    label = Variable(label.cuda())
                else:
                    im = Variable(im)
                    label = Variable(label)
                out = net(im)
                loss = criterion(out, label)

                net.zero_grad()
                test_loss += loss.data
                ac = get_acc(out, label)
                test_acc += ac
                if idx % 20 == 0:
                    test_losses.append(loss.data)
                    test_acces.append(ac)
                idx += 1
            print(
                "epoch:{},Train Loss:{:.5f},Train Acc:{:.5f}, Test Loss{:.5f},Test Acc{:.5f},train time {}".format(
                    e + 1,
                    train_loss / len(
                        train_data),
                    train_acc / len(
                        train_data),
                    test_loss / len(
                        test_data),
                    test_acc / len(
                        test_data), str))
        else:
            print("epoch:{},Train Loss:{:.5f},Train Acc:{:.5f},train time {}".format(e + 1,
                                                                                     train_loss / len(
                                                                                         train_data),
                                                                                     train_acc / len(
                                                                                         train_data), str
                                                                                     ))
    return losses, acces, test_losses, test_acces

该函数中“net”是输入的神经网络,“train_data”与“test_data”分别指训练数据与测试数据,“epochs”表示训练轮数,而“criterion”与“optimizer”分别是指损失函数与优化器。

代码内,通过torch.cuda.is_available()判断设备是否能通过显卡进行加速,如果能加速,则将张量放到显卡上。

然后,创建几个list用于存放中间数据用于显示图像,并通过for循环开始训练模型。创建两个参数用于存放每轮训练后的损失与精度。嵌套for循环遍历所有训练数据。经过预测、计算损失、梯度归零、反向梯度计算、更新参数实现训练。将一批所得到的损失与精度累加到创建的“train_loss”与“train_acc”中,并将中间部分损失与精度添加到所创建的“losses”与“acces”中用于绘制图像。

若存在测试数据,则仿照训练过程建立测试过程,测试过程不需要反向计算梯度与更新参数,只需要计算损失与精度。

然后,通过matplotlib库,实现中间损失值与精度的可视化。先建立一个绘图函数:

def drawing(y_axis_list, label):
    if torch.cuda.is_available():
        y_axis_list = [y.cpu().numpy() for y in y_axis_list]
    x_axis = np.linspace(0, 10, len(y_axis_list), endpoint=True)
    plt.plot(x_axis, y_axis_list, label=label)
    plt.legend(loc="best")
    plt.show()

其中,y_axis_list是需要显示的中间数据list。

然后用一个函数接收数据处理的得到的中间数据并通过以上方法绘图。

def drawing_all(net, train_data, epochs, criterion, optimizer, test_data=None):
    losses, acces, test_losses, test_acces = train_model(net, train_data, epochs, criterion, optimizer, test_data)
    drawing(losses, label="train loss")
    drawing(acces, label="train acc")
    if test_data is not None:
        drawing(test_losses, label="test loss")
        drawing(test_acces, label="test acc")

三、Alexnet


该卷积神经网络模型较为简单,一共包含两个卷积层、两个池化层以及三个全连接层。其基本结构如下:

实现代码如下所示:

class net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 5),
            nn.ReLU(True)
        )
        self.pool1 = nn.MaxPool2d(3, 2)
        self.conv2 = nn.Sequential(
            nn.Conv2d(64, 64, 5, 1),
            nn.ReLU(True)
        )
        self.pool2 = nn.MaxPool2d(3, 2)
        self.fc1 = nn.Sequential(
            nn.Linear(1024, 384),
            nn.ReLU(True)
        )
        self.fc2 = nn.Sequential(
            nn.Linear(384, 192),
            nn.ReLU(True)
        )
        self.fc3 = nn.Sequential(
            nn.Linear(192, 10)
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.pool2(x)
        x = x.view(x.shape[0], -1)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

由上述加载的数据可知,图像大小为32*32。第一层卷积conv1,输入通道数为3,输出通道数为64,使用卷积大小为5*5,由卷积计算公式

H_{out}=\lfloor\frac{H_{in}+2p-k}{s}\rfloor+1

W_{out}=\lfloor\frac{W_{in}+2p-k}{s}\rfloor+1

可得(其中H_{out}W_{out}为输出图像高度、宽度,H_{in}/W_{in}为输入图像高度/宽度,s表示卷积移动的步长,p为边界填充术,k为卷积核边大小即卷积核大小为k*k),卷积后的图像大小为28。然后经过最大池化层pool1后,图像大小通过以下公式计算:

H_{out} = \lfloor\frac{H_{in}-k_{h}}{s_{h}}+1\rfloor

W_{out} = \lfloor\frac{W_{in}-k_{w}}{s_{w}}+1\rfloor

得到的图像大小为13(向下取整),通道数为64不变。以此类推,经过conv2后图像大小为9*9,经过pool2后图像大小为4*4,最终通道数为64。

在通过全连接层之前,需要将数据转换为适合全连接层处理的二维张量,即其中第一维是批量大小,第二维是特征维度。此处在经过pool2后,所得到的张量结构为[64,64,4,4],其中第一维的64是指一批次数据共有64个,这是我们在DataLoader中设置的“batch_size”,第二个64则为输出图像的通道数,而最后两个4分别代表当前图像的高和宽。而全连接层适合处理的数据是二维的,我们需要将四维张量中第一维保留,二、三、四维合并成一维。采用 x = x.view(x.shape[0], -1),便能实现这一需求,其保留第一维,而将二、三、四维的单位数量相乘得到满足要求的二维即,张量结构 :[64,64*4*4] = [64,1024]。然后经过三次全连接层,得到分类结果,张量结构为[64,10],其中每一行代表一个图像对象,而一行中的十个元素分别代表属于对应类的概率。

构建损失函数与优化器并开始训练:

alexnet = net().cuda()
optimizer = torch.optim.SGD(alexnet.parameters(), 0.01)
criterion = nn.CrossEntropyLoss()
drawing_all(alexnet, train_data, 20, criterion, optimizer, test_data)

输出结果如下所示:

epoch:11,Train Loss:0.32864,Train Acc:0.88215, Test Loss1.07557,Test Acc0.70611,train time 00:00:03
epoch:12,Train Loss:0.27597,Train Acc:0.90259, Test Loss1.56170,Test Acc0.63893,train time 00:00:03
epoch:13,Train Loss:0.23948,Train Acc:0.91514, Test Loss1.60136,Test Acc0.67506,train time 00:00:03
epoch:14,Train Loss:0.21529,Train Acc:0.92509, Test Loss1.11191,Test Acc0.73298,train time 00:00:03
epoch:15,Train Loss:0.18631,Train Acc:0.93404, Test Loss1.35655,Test Acc0.71019,train time 00:00:03
epoch:16,Train Loss:0.15712,Train Acc:0.94421, Test Loss1.30687,Test Acc0.72641,train time 00:00:03
epoch:17,Train Loss:0.14719,Train Acc:0.94747, Test Loss1.45539,Test Acc0.72014,train time 00:00:03
epoch:18,Train Loss:0.13866,Train Acc:0.95115, Test Loss1.50308,Test Acc0.70770,train time 00:00:03
epoch:19,Train Loss:0.11978,Train Acc:0.95888, Test Loss1.58350,Test Acc0.71736,train time 00:00:03
epoch:20,Train Loss:0.10570,Train Acc:0.96409, Test Loss1.91907,Test Acc0.69188,train time 00:00:03

四、VGG


VGG(Visual Geometry Group)是一种经典的深度卷积神经网络架构。其使用了较小的卷积核(通常为 3×3),通过多层叠加来增加感受野,同时具有较深的网络深度。该方法叠加是通过块的形式实现的,基本结构如下:

 首先编写块的代码:

def vgg_block(num_convs, in_channels, out_channels):
    net = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU(True)]  # 定义第一层

    for i in range(num_convs - 1):  # 定义后面的很多层
        net.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        net.append(nn.ReLU(True))

    net.append(nn.MaxPool2d(2, 2))  # 定义池化层
    return nn.Sequential(*net)

在代码中,“num_cons”表示块中卷积层的层数,“in_channels”与“out_channels”分别表示输入通道与输出通道。通过上述计算图像卷积后的大小的公式可知,当设置卷积核为3*3,边界填充为1步长为1时,不会改变图像的大小,所以在块中进行的卷积只改变输入数据的通道数,不会改变图像大小,而在最后经过最大池化时,会将图像边缩小一半。

然后,再编写一个代码,将所有的block汇总以简化代码:

def vgg_stack(num_convs, channels):
    net = []
    for n, c in zip(num_convs, channels):
        in_c = c[0]
        out_c = c[1]
        net.append(vgg_block(n, in_c, out_c))
    return nn.Sequential(*net)
vgg_net = vgg_stack((1, 1, 2, 2, 2), ((3, 64), (64, 128), (128, 256), (256, 512), (512, 512)))

其中,“num_convs”与“channels”均时数组,其中存放对应的 用于创建block的参数。一共创建了5个block,所以图像大小缩小为原来的1/2^{5},即现在图像大小为1*1。而经过块后得到的通道数为512。

接着,编写代码构建Vgg神经网络:

class vgg(nn.Module):
    def __init__(self):
        super(vgg, self).__init__()
        self.feature = vgg_net
        self.fc = nn.Sequential(
            nn.Linear(512, 100),
            nn.ReLU(True),
            nn.Linear(100, 10)
        )

    def forward(self, x):
        x = self.feature(x)
        x = x.view(x.shape[0], -1)
        x = self.fc(x)
        return x

将创建的5个块放入到模型中,然后通过一个全连接层将输出转换为10个分类结果。

最后训练模型并绘制变化图。

net = vgg().cuda()
optimizer = torch.optim.SGD(net.parameters(), lr=1e-1)
criterion = nn.CrossEntropyLoss()
drawing_all(net, train_data, 20, criterion, optimizer, test_data)

训练结果如下:

epoch:10,Train Loss:0.98666,Train Acc:0.64894, Test Loss1.28605,Test Acc0.55647,train time 00:00:12
epoch:11,Train Loss:0.82546,Train Acc:0.71028, Test Loss1.03002,Test Acc0.64300,train time 00:00:11
epoch:12,Train Loss:0.69107,Train Acc:0.76053, Test Loss1.00695,Test Acc0.66297,train time 00:00:11
epoch:13,Train Loss:0.57084,Train Acc:0.80279, Test Loss1.09545,Test Acc0.67514,train time 00:00:11
epoch:14,Train Loss:0.46120,Train Acc:0.83985, Test Loss0.97725,Test Acc0.70134,train time 00:00:12
epoch:15,Train Loss:0.36916,Train Acc:0.87372, Test Loss2.38160,Test Acc0.50277,train time 00:00:11
epoch:16,Train Loss:0.29689,Train Acc:0.89786, Test Loss0.91189,Test Acc0.73764,train time 00:00:11
epoch:17,Train Loss:0.22981,Train Acc:0.92072, Test Loss1.19755,Test Acc0.72231,train time 00:00:12
epoch:18,Train Loss:0.18595,Train Acc:0.93638, Test Loss1.02736,Test Acc0.74763,train time 00:00:11
epoch:19,Train Loss:0.15042,Train Acc:0.94823, Test Loss0.88659,Test Acc0.77640,train time 00:00:11
epoch:20,Train Loss:0.11829,Train Acc:0.96014, Test Loss1.01647,Test Acc0.75653,train time 00:00:11

五、Googlenet


一种实现多路径并发的卷积神经网络,其引入了 Inception 模块,该模块在不同尺度上进行卷积操作(1×1、3×3、5×5 等),然后将结果合并,以提取多尺度的特征。并使用全局平均池化来替代全连接层,进一步减少参数量。一个Inception 模块一般包含四个并行线路,如下所示:

为了快速构建卷积,我们先编写一个创建卷积层的函数,代码如下:

def conv_relu(in_channel, out_channel, kernel, stride=1, padding=0):
    layer = nn.Sequential(
        nn.Conv2d(in_channel, out_channel, kernel, stride, padding),
        nn.BatchNorm2d(out_channel, eps=1e-3),
        nn.ReLU(True)
    )
    return layer

 通过向该方法中输入输入通道、输出通道、卷积大小、步长以及边界填充便能建立卷积层并激活。

接下来建立Inception 模块函数,其代码如下:

class inception(nn.Module):
    def __init__(self, in_channel, out1_1, out2_1, out2_3, out3_1, out3_5, out4_1):
        super(inception, self).__init__()
        # 第一条线路
        self.branch1x1 = conv_relu(in_channel, out1_1, 1)
        # 第二条线路
        self.branch3x3 = nn.Sequential(
            conv_relu(in_channel, out2_1, 1),
            conv_relu(out2_1, out2_3, 3, padding=1)
        )
        # 第三条线路
        self.branch5x5 = nn.Sequential(
            conv_relu(in_channel, out3_1, 1),
            conv_relu(out3_1, out3_5, 5, padding=2)
        )
        # 第四条线路
        self.branch_pool = nn.Sequential(
            nn.MaxPool2d(3, stride=1, padding=1),
            conv_relu(in_channel, out4_1, 1)
        )

    def forward(self, x):
        f1 = self.branch1x1(x)
        f2 = self.branch3x3(x)
        f3 = self.branch5x5(x)
        f4 = self.branch_pool(x)
        output = torch.cat((f1, f2, f3, f4), dim=1)
        return output

该函数将上述四条路线的处理值汇总在一起,由于四天线路卷积处理结果均不会改变图像大小,输出的数据的张量结构仅有通道数与输入数据不同,故能通过torch.cat()方法将四条线路的数据汇总在一起,最终输出通道数为“out1_1+out2_3+out3_5+out4_1”计算得到的值。

GooglenetVgg在结构上比较相似,也是通过块的叠加实现模型的建立,区别在于Googlenet模型的块中包含Inception 模块。编写googlenet卷积神经网络的代码如下:

class googlenet(nn.Module):
    def __init__(self, in_channel, num_classes):
        super(googlenet, self).__init__()
        self.block1 = nn.Sequential(
            conv_relu(in_channel, out_channel=64, kernel=7, stride=2, padding=3),
            nn.MaxPool2d(3, 2)
        )

        self.block2 = nn.Sequential(
            conv_relu(64, 64, kernel=1),
            conv_relu(64, 192, kernel=3, padding=1),
            nn.MaxPool2d(3, 2)
        )

        self.block3 = nn.Sequential(
            inception(192, 64, 96, 128, 16, 32, 32),
            inception(256, 128, 128, 192, 32, 96, 64),
            nn.MaxPool2d(3, 2)
        )

        self.block4 = nn.Sequential(
            inception(480, 192, 96, 208, 16, 48, 64),
            inception(512, 160, 112, 224, 24, 64, 64),
            inception(512, 128, 128, 256, 24, 64, 64),
            inception(512, 112, 144, 288, 32, 64, 64),
            inception(528, 256, 160, 320, 32, 128, 128),
            nn.MaxPool2d(3, 2)
        )

        self.block5 = nn.Sequential(
            inception(832, 256, 160, 320, 32, 128, 128),
            inception(832, 384, 182, 384, 48, 128, 128),
            nn.AvgPool2d(2)
        )

        self.classifier = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)    
        x = x.view(x.shape[0], -1)
        x = self.classifier(x)
        return x

根据前面提到的图像经过卷积后的大小计算公式 ,从结构我们不难推算最终输出的图像为原来图像的1/96,而CIFAR10中所给图像数据为32*32,故需要在处理之前将图像放大为96*96以满足在线性分类前数据输出为1*1的图像,从而方便转换为二维张量。

在数据导入阶段,将data_tf()设置如下所示,便能放大图像:

def data_tf(x):
    x = x.resize((96, 96), 2)
    x = np.array(x, dtype="float32") / 255
    x = (x - 0.5) / 0.5
    x = x.transpose((2, 0, 1))
    x = torch.from_numpy(x)
    return x

训练模型并绘制变化图:

net = googlenet(3, 10)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), 0.01)
drawing_all(net, train_data, 20, criterion, optimizer, test_data)

结果如下所示:

epoch:10,Train Loss:0.17114,Train Acc:0.93964, Test Loss1.55773,Test Acc0.64043,train time 00:00:34
epoch:11,Train Loss:0.14504,Train Acc:0.95031, Test Loss1.11524,Test Acc0.72379,train time 00:00:34
epoch:12,Train Loss:0.12048,Train Acc:0.95898, Test Loss0.91314,Test Acc0.76642,train time 00:00:34
epoch:13,Train Loss:0.08977,Train Acc:0.96863, Test Loss0.79915,Test Acc0.79697,train time 00:00:34
epoch:14,Train Loss:0.07395,Train Acc:0.97504, Test Loss1.88055,Test Acc0.65516,train time 00:00:34
epoch:15,Train Loss:0.07005,Train Acc:0.97584, Test Loss1.73377,Test Acc0.66367,train time 00:00:34
epoch:16,Train Loss:0.07610,Train Acc:0.97321, Test Loss1.84687,Test Acc0.66189,train time 00:00:34
epoch:17,Train Loss:0.05354,Train Acc:0.98188, Test Loss0.95662,Test Acc0.78352,train time 00:00:34
epoch:18,Train Loss:0.04173,Train Acc:0.98687, Test Loss1.96565,Test Acc0.64181,train time 00:00:33
epoch:19,Train Loss:0.03448,Train Acc:0.98909, Test Loss0.85955,Test Acc0.81102,train time 00:00:32
epoch:20,Train Loss:0.03177,Train Acc:0.98921, Test Loss0.88027,Test Acc0.81112,train time 00:00:33

 

六、Resnet


随着网络层数的不断增加,在进行反向梯度计算时丢失的值也会越来越多,导致计算到浅层时梯度值非常小,更新参数十分有限。为了减少模型深度对梯度运算的影响,Resnet方法引入跨层连接解决梯度回传消失的问题(也称残差学习模型),示意图如下所示:

 首先创建3*3卷积函数,方便快速搭建模型:

def conv3x3(in_channel, out_channel, stride=1):
    return nn.Conv2d(in_channel, out_channel, 3, stride=stride, padding=1, bias=False)

该方法同Vgg,也是基于块实现的,接下来建立相应的块。

class residual_block(nn.Module):
    def __init__(self, in_channel, out_channel, same_shape=True):
        super(residual_block, self).__init__()
        self.same_shape = same_shape
        stride = 1 if self.same_shape else 2

        self.conv1 = conv3x3(in_channel, out_channel, stride=stride)
        self.bn1 = nn.BatchNorm2d(out_channel)

        self.conv2 = conv3x3(out_channel, out_channel)
        self.bn2 = nn.BatchNorm2d(out_channel)
        if not self.same_shape:
            self.conv3 = nn.Conv2d(in_channel, out_channel, 1, stride=stride)

    def forward(self, x):
        out = self.conv1(x)
        out = F.relu(self.bn1(out), True)
        out = self.conv2(out)
        out = F.relu(self.bn2(out), True)

        if not self.same_shape:
            x = self.conv3(x)
        return F.relu(x + out, True)

如代码所示,若“same_shape=True”,则经过残差块模型边不会改变图像大小,若“same_shape=False”,则图像输出大小为原来的1/2。同时将输入图像的大小也转换为原来的1/2以便于最后“x”与“out”一致。

接下来建立Resnet神经网络:

class resnet(nn.Module):
    def __init__(self, in_channel, num_classes):
        super(resnet, self).__init__()

        self.block1 = nn.Conv2d(in_channel, 64, 7, 2)

        self.block2 = nn.Sequential(
            nn.MaxPool2d(3, 2),
            residual_block(64, 64),
            residual_block(64, 64)
        )

        self.block3 = nn.Sequential(
            residual_block(64, 128, False),
            residual_block(128, 128)
        )

        self.block4 = nn.Sequential(
            residual_block(128, 256, False),
            residual_block(256, 256)
        )

        self.block5 = nn.Sequential(
            residual_block(256, 512, False),
            residual_block(512, 512),
            nn.AvgPool2d(3)
        )

        self.classifier = nn.Linear(512, num_classes)

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)
        x = x.view(x.shape[0], -1)
        x = self.classifier(x)
        return x

由Resnet结构可知,输入图像在线性分类器前所得到的图像大小为原来图像的1/96,故在处理上与Googlenet比较相似,需要将图像通过data_tf放大到96*96

开始训练模型并绘制变化图,代码如下:

criterion = nn.CrossEntropyLoss()
net = resnet(3, 10)
optimizer = torch.optim.SGD(net.parameters(), 0.01)
drawing_all(net, train_data, 20, criterion, optimizer, test_data)

所得结果如下所示:

epoch:10,Train Loss:0.06246,Train Acc:0.98092, Test Loss1.17767,Test Acc0.72350,train time 00:00:24
epoch:11,Train Loss:0.04207,Train Acc:0.98715, Test Loss1.08031,Test Acc0.74486,train time 00:00:24
epoch:12,Train Loss:0.03135,Train Acc:0.99137, Test Loss1.72004,Test Acc0.66861,train time 00:00:24
epoch:13,Train Loss:0.02061,Train Acc:0.99397, Test Loss1.07342,Test Acc0.75494,train time 00:00:24
epoch:14,Train Loss:0.01553,Train Acc:0.99576, Test Loss1.06124,Test Acc0.76424,train time 00:00:24
epoch:15,Train Loss:0.01281,Train Acc:0.99646, Test Loss1.04019,Test Acc0.76978,train time 00:00:24
epoch:16,Train Loss:0.00737,Train Acc:0.99842, Test Loss1.23965,Test Acc0.74891,train time 00:00:24
epoch:17,Train Loss:0.00589,Train Acc:0.99886, Test Loss1.04384,Test Acc0.77710,train time 00:00:24
epoch:18,Train Loss:0.00259,Train Acc:0.99972, Test Loss1.02076,Test Acc0.78382,train time 00:00:24
epoch:19,Train Loss:0.00222,Train Acc:0.99976, Test Loss1.01478,Test Acc0.78580,train time 00:00:24
epoch:20,Train Loss:0.00125,Train Acc:0.99992, Test Loss1.01377,Test Acc0.78649,train time 00:00:24

七、Densenet


Densenet在结构上与Resnet比较相似,不同在于Resnet只连接了相邻层的特征通道,而Densenet时间每一层的特征通道相连接,这样能加强了特征传播,促进了特征的复用、缓解了梯度消失问题,使得训练更加容易和稳定、以更少的参数实现更优的性能。

同样的,我们首先封装3*3的卷积便于简化代码:

def conv_block(in_channel, out_channel):
    layer = nn.Sequential(
        nn.BatchNorm2d(in_channel),
        nn.ReLU(True),
        nn.Conv2d(in_channel, out_channel, 3, padding=1, bias=False)
    )
    return layer

 然后,构建块模型如下:

class dense_block(nn.Module):
    def __init__(self, in_channel, growth_rate, num_layers):  # 64 32 6
        super(dense_block, self).__init__()
        block = []
        channel = in_channel
        for i in range(num_layers):
            block.append(conv_block(channel, growth_rate))
            channel += growth_rate
        self.net = nn.Sequential(*block)

    def forward(self, x):
        for layer in self.net:
            out = layer(x)
            x = torch.cat((out, x), dim=1)
        return x

由代码可知,通过该模块的图像的图像大小不会发生变化,最终通道数为“in_channel+growth_rate*num_layers”。我们不难发现,随着使用块的次数增加,通道数会越来越大参数和计算量也会越来越大,为了避免这一问题,我们还需要引入一个过度层,以降低图像的大小以及通道数量。

def transition(in_channel, out_channel):
    trans_layer = nn.Sequential(
        nn.BatchNorm2d(in_channel),
        nn.ReLU(True),
        nn.Conv2d(in_channel, out_channel, 1),
        nn.AvgPool2d(2, 2)
    )
    return trans_layer

接着,根据块建立Densenet模型:

class densenet(nn.Module):
    def __init__(self, in_channel, num_classes, growth_rate=32, block_layers=[6, 12, 24, 16]):
        super(densenet, self).__init__()
        self.block1 = nn.Sequential(
            nn.Conv2d(in_channel, 64, 7, 2, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.MaxPool2d(3, 2, padding=1)
        )

        channels = 64
        block = []
        for i, layers in enumerate(block_layers):
            block.append(dense_block(channels, growth_rate, layers))
            channels += layers * growth_rate
            if i != len(block_layers) - 1:
                block.append(transition(channels, channels // 2))  # 通过 transition 层将大小减半,通道数减半
                channels = channels // 2

        self.block2 = nn.Sequential(*block)
        self.block2.add_module('bn', nn.BatchNorm2d(channels))
        self.block2.add_module('relu', nn.ReLU(True))
        self.block2.add_module('avg_pool', nn.AvgPool2d(3))

        self.classifier = nn.Linear(channels, num_classes)

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)

        x = x.view(x.shape[0], -1)
        x = self.classifier(x)
        return x

由模型不难得出,在线性分类层之前,模型将图像缩小为原来的1/96,同Googlenet中对图像放大的方法修改data_tf()

开始训练并绘制变化图:

net = densenet(3, 10)
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()
drawing_all(net, train_data, 20, criterion, optimizer, test_data)

最终得到以下结果:

epoch:10,Train Loss:0.12572,Train Acc:0.95760, Test Loss1.16819,Test Acc0.71460,train time 00:01:28
epoch:11,Train Loss:0.10405,Train Acc:0.96573, Test Loss1.23051,Test Acc0.70817,train time 00:01:28
epoch:12,Train Loss:0.08072,Train Acc:0.97422, Test Loss1.81234,Test Acc0.66218,train time 00:01:30
epoch:13,Train Loss:0.06225,Train Acc:0.97936, Test Loss1.19495,Test Acc0.73823,train time 00:01:30
epoch:14,Train Loss:0.04632,Train Acc:0.98579, Test Loss2.49423,Test Acc0.57545,train time 00:01:30
epoch:15,Train Loss:0.05898,Train Acc:0.98096, Test Loss1.02799,Test Acc0.75138,train time 00:01:30
epoch:16,Train Loss:0.04276,Train Acc:0.98679, Test Loss4.28158,Test Acc0.42979,train time 00:01:28
epoch:17,Train Loss:0.03461,Train Acc:0.98935, Test Loss1.93665,Test Acc0.69828,train time 00:01:30
epoch:18,Train Loss:0.02786,Train Acc:0.99153, Test Loss0.65253,Test Acc0.84059,train time 00:01:29
epoch:19,Train Loss:0.02029,Train Acc:0.99405, Test Loss0.84650,Test Acc0.80934,train time 00:01:29
epoch:20,Train Loss:0.02141,Train Acc:0.99359, Test Loss0.65974,Test Acc0.84415,train time 00:01:27

八 、总结

本实验我使用的是NVIDIA GeForce RTX 2060显卡与11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHZ的CPU,在实现过程中采用的显卡加速。而其中对cuda的使用有不当之处,鲁棒性不佳,若无法实现显卡加速还需要修改。仍然存在许多不足,希望大家提供意见以供学习。

此代码仅为学习,都是网上查的资料,七零八落的,就没有列参考文献。

近年来,深度学习技术在图像识别领域得到广泛的应用。CIFAR-10是一个常用的图像数据集,由10个类别共计60,000张32×32的彩色图片组成。本文基于pytorch框架,完成了对经典的LeNet、AlexNetVGGGoogLeNet、MobileNet、ResNetDenseNet等模型的复现,并进行了相应的实验和比较。 在实现过程中,我们按照经典模型论文中的网络结构和超参数,逐步搭建了各个模型,并在CIFAR-10数据集上进行训练。通过实验结果分析,我们发现在相同的训练轮数下,DenseNet具有最高的精度,其次是ResNetVGG;MobileNet的训练速度最快,但精度稍低。AlexNetGoogLeNet的精度和训练速度都表现较为均衡;LeNet的训练过程中出现了过拟合现象,精度相对较低。 此外,我们还对模型的计算复杂度和参数量进行了分析。在计算复杂度方面,LeNet最为简单,AlexNetVGG的计算复杂度逐渐上升,MobileNet的计算复杂度最低,ResNetDenseNet的计算复杂度远高于其他模型。在参数量方面,LeNet和AlexNet较为轻量,而后续的模型参数量逐渐增加,ResNetDenseNet的参数量更是达到了数百万。 总的来说,通过本次实验,我们深入了解并复现了多种常见的图像识别模型,对各模型的特点和适用范围有了更加深入全面的认识。这将有助于我们在实际项目中选择合适的模型,并进行更加准确的预测和分析。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值