【动手复现经典模型】AlexNet

Zhang_GuoHui

已于 2023-12-05 20:09:15 修改

阅读量439

点赞数 9

分类专栏：神经网络文章标签：深度学习机器学习人工智能

于 2023-12-04 08:42:28 首次发布

本文链接：https://blog.csdn.net/Zhang_GuoHui/article/details/134771561

版权

神经网络专栏收录该内容

5 篇文章 0 订阅

订阅专栏

本文探讨了AlexNet与LeNet的对比，重点在于AlexNet的深度架构、卷积层设计、ReLU激活函数和Dropout的使用。文章展示了在FashionMNIST数据集上进行的训练过程及性能提升。

摘要由CSDN通过智能技术生成

理论知识

AlexNet 与 LeNet 很类似，但是有着显著差异：

AlexNet 要深，由 8 层组成，五个卷积层，两个全连接隐藏层和一个全连接输出层
使用 ReLU 作为激活函数

模型设计
第一层采用了 11x11 的卷积层，这是由于 ImageNet 的图像要比 MNIST 中大十倍以上，需要更大的窗口来捕获目标。第二层是 5x5 卷积层，然后是三个 3x3 卷积，在第一层、第三层、第五层卷积之后加入了 3x3，步幅为 2 的最大池化层，且通道数是 LeNet 的十倍
最后一个卷积层后面有两个全连接层，分别有 4096 个输出，这两个大全连接层将近有 1G 的模型参数，因此原版的 AlexNet 采用了双数据流设计，使得每个 GPU 只负责存储和计算模型一半的参数

激活函数
ReLU 激活函数计算简单，且训练更加容易。Sigmoid 函数在输出为 1 和 0 的地方梯度为 0 会出现梯度消失

参数控制和预处理
AlexNet 采用了 Dropout 来控制全连接层的复杂度，而 LeNet 只采用了权重衰减（正则化）同时 AlexNet 还采用了大量的图像增强数据，如翻转、裁切和变色等，使得模型更加健壮，更大的样本量减少了过拟合

代码实现

这里用到的数据集是FashionMNIST，但是做了一些小处理，将原来28x28的图片放大到了224x224，这是因为AlexNet用在ImageNet数据集上的，仅为了简单复现，不需要用到ImageNet数据集

import torch
import torchvision
import torch.nn as nn
from torch.utils.data import DataLoader
from tqdm import tqdm
from torchinfo import summary
import matplotlib.pyplot as plt

epochs = 10
batch_size = 256
lr = 0.001

device = 'cuda:0' if torch.cuda.is_available() else "cpu"

data_trans = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),torchvision.transforms.Resize((224, 224))])
train_dataset = torchvision.datasets.FashionMNIST("../00data", True, data_trans, download=True)
test_dataset = torchvision.datasets.FashionMNIST("../00data", False, data_trans, download=True)
train_dataloader = DataLoader(train_dataset, batch_size, True)
test_dataloader = DataLoader(test_dataset, batch_size, True)


class AlexNet(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 96, 11, 4, 1)
        self.maxpool1 = nn.MaxPool2d(3, 2)
        self.conv2 = nn.Conv2d(96, 256, 5, padding=2)
        self.maxpool2 = nn.MaxPool2d(3, 2)
        self.conv3 = nn.Conv2d(256, 384, 3, padding=1)
        self.conv4 = nn.Conv2d(384, 384, 3, padding=1)
        self.conv5 = nn.Conv2d(384, 256, 3, padding=1)
        self.flatten = nn.Flatten(1)
        self.linear1 = nn.Linear(6400, 4096)
        self.linear2 = nn.Linear(4096, 4096)
        self.linear3 = nn.Linear(4096, 10)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout()

    def forward(self, input):
        h1 = self.relu(self.conv1(input))
        h1 = self.maxpool1(h1)
        h2 = self.relu(self.conv2(h1))
        h2 = self.maxpool2(h2)
        h3 = self.relu(self.conv3(h2))
        h4 = self.relu(self.conv4(h3))
        h5 = self.relu(self.conv5(h4))
        h5 = self.maxpool1(h5)
        h5 = self.flatten(h5)
        h6 = self.relu(self.linear1(h5))
        h6 = self.dropout(h6)
        h7 = self.relu(self.linear2(h6))
        h7 = self.dropout(h7)
        h8 = self.linear3(h7)

        return h8

alexnet = AlexNet()
alexnet = alexnet.to(device)
celoss = torch.nn.CrossEntropyLoss()
optimer = torch.optim.Adam(alexnet.parameters(), lr=lr)

train_loss_all = []
test_loss_all = []
train_acc = []
test_acc = []
for epoch in range(epochs):

    test_loss = 0.0
    train_loss = 0.0
    right = 0.0
    right_num = 0.0
    
    for inputs, labels in tqdm(train_dataloader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        outputs = alexnet(inputs)
        loss = celoss(outputs, labels)
        train_loss += loss.detach().cpu().numpy()

        optimer.zero_grad()
        loss.backward()
        optimer.step()

        right = outputs.argmax(dim=1) == labels
        right_num += right.sum().detach().cpu().numpy()
    train_loss_all.append(train_loss / float(len(train_dataloader)))
    train_acc.append(right_num / len(train_dataset))
    with torch.no_grad():
        right = 0.0
        right_num = 0.0
        for inputs, labels in tqdm(test_dataloader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = alexnet(inputs)
            loss = celoss(outputs, labels)
            test_loss += loss.detach().cpu().numpy()

            right = outputs.argmax(dim=1) == labels
            right_num += right.sum().detach().cpu().numpy()
        test_loss_all.append(test_loss / float(len(test_dataloader)))
        test_acc.append(right_num / len(test_dataset))
        print(f'eopch: {epoch + 1}, train_loss: {train_loss / len(train_dataloader)}, test_loss: {test_loss / len(test_dataloader) }, acc: {right_num / len(test_dataset) * 100}%')
x = range(1, epochs + 1)
plt.plot(x, train_loss_all, label = 'train_loss', linestyle='--')
plt.plot(x, test_loss_all, label = 'test_loss', linestyle='--')
plt.plot(x, train_acc, label = 'train_acc', linestyle='--')
plt.plot(x, test_acc, label = 'test_acc', linestyle='--')
plt.legend()
plt.show()

结合之前的博客–打印神经网络各层的输出

net = AlexNet()
print(summary(net, (1, 1, 224, 224)))

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
AlexNet                                  [1, 10]                   --
├─Conv2d: 1-1                            [1, 96, 54, 54]           11,712
├─ReLU: 1-2                              [1, 96, 54, 54]           --
├─MaxPool2d: 1-3                         [1, 96, 26, 26]           --
├─Conv2d: 1-4                            [1, 256, 26, 26]          614,656
├─ReLU: 1-5                              [1, 256, 26, 26]          --
├─MaxPool2d: 1-6                         [1, 256, 12, 12]          --
├─Conv2d: 1-7                            [1, 384, 12, 12]          885,120
├─ReLU: 1-8                              [1, 384, 12, 12]          --
├─Conv2d: 1-9                            [1, 384, 12, 12]          1,327,488
├─ReLU: 1-10                             [1, 384, 12, 12]          --
├─Conv2d: 1-11                           [1, 256, 12, 12]          884,992
├─ReLU: 1-12                             [1, 256, 12, 12]          --
├─MaxPool2d: 1-13                        [1, 256, 5, 5]            --
├─Flatten: 1-14                          [1, 6400]                 --
├─Linear: 1-15                           [1, 4096]                 26,218,496
├─ReLU: 1-16                             [1, 4096]                 --
├─Linear: 1-17                           [1, 4096]                 16,781,312
├─ReLU: 1-18                             [1, 4096]                 --
├─Dropout: 1-19                          [1, 4096]                 --
├─Linear: 1-20                           [1, 10]                   40,970
==========================================================================================
Total params: 46,764,746
Trainable params: 46,764,746
Non-trainable params: 0
Total mult-adds (M): 938.75
==========================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 4.87
Params size (MB): 187.06
Estimated Total Size (MB): 192.13
==========================================================================================

训练结果

在colab上用T4这块GPU，跑了10代的训练结果，可以看到：大力出奇迹！

eopch: 1, train_loss: 0.7214684476243689, test_loss: 0.3966940574347973, acc: 85.71%
eopch: 2, train_loss: 0.3377066216570266, test_loss: 0.33194345571100714, acc: 88.34%
eopch: 3, train_loss: 0.2881323180934216, test_loss: 0.295409569516778, acc: 89.06%
eopch: 4, train_loss: 0.25911708556591195, test_loss: 0.2705956816673279, acc: 90.23%
eopch: 5, train_loss: 0.23344570303216894, test_loss: 0.2671656012535095, acc: 90.08%
eopch: 6, train_loss: 0.21801440338505076, test_loss: 0.27004530634731055, acc: 90.44%
eopch: 7, train_loss: 0.20689800283376206, test_loss: 0.25210654716938735, acc: 91.04%
eopch: 8, train_loss: 0.19044599720138183, test_loss: 0.2566268537193537, acc: 91.26%
eopch: 9, train_loss: 0.17452049604121675, test_loss: 0.26449268609285354, acc: 91.26%
eopch: 10, train_loss: 0.16033657827275866, test_loss: 0.2467486930079758, acc: 91.75%