理论知识
AlexNet 与 LeNet 很类似,但是有着显著差异:
- AlexNet 要深,由 8 层组成,五个卷积层,两个全连接隐藏层和一个全连接输出层
- 使用 ReLU 作为激活函数
模型设计 - 第一层采用了 11x11 的卷积层,这是由于 ImageNet 的图像要比 MNIST 中大十倍以上,需要更大的窗口来捕获目标。第二层是 5x5 卷积层,然后是三个 3x3 卷积,在第一层、第三层、第五层卷积之后加入了 3x3,步幅为 2 的最大池化层,且通道数是 LeNet 的十倍
- 最后一个卷积层后面有两个全连接层,分别有 4096 个输出,这两个大全连接层将近有 1G 的模型参数,因此原版的 AlexNet 采用了双数据流设计,使得每个 GPU 只负责存储和计算模型一半的参数
激活函数
ReLU 激活函数计算简单,且训练更加容易。Sigmoid 函数在输出为 1 和 0 的地方梯度为 0 会出现梯度消失
参数控制和预处理
AlexNet 采用了 Dropout 来控制全连接层的复杂度,而 LeNet 只采用了权重衰减(正则化)同时 AlexNet 还采用了大量的图像增强数据,如翻转、裁切和变色等,使得模型更加健壮,更大的样本量减少了过拟合
代码实现
这里用到的数据集是FashionMNIST,但是做了一些小处理,将原来28x28的图片放大到了224x224,这是因为AlexNet用在ImageNet数据集上的,仅为了简单复现,不需要用到ImageNet数据集
import torch
import torchvision
import torch.nn as nn
from torch.utils.data import DataLoader
from tqdm import tqdm
from torchinfo import summary
import matplotlib.pyplot as plt
epochs = 10
batch_size = 256
lr = 0.001
device = 'cuda:0' if torch.cuda.is_available() else "cpu"
data_trans = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),torchvision.transforms.Resize((224, 224))])
train_dataset = torchvision.datasets.FashionMNIST("../00data", True, data_trans, download=True)
test_dataset = torchvision.datasets.FashionMNIST("../00data", False, data_trans, download=True)
train_dataloader = DataLoader(train_dataset, batch_size, True)
test_dataloader = DataLoader(test_dataset, batch_size, True)
class AlexNet(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 96, 11, 4, 1)
self.maxpool1 = nn.MaxPool2d(3, 2)
self.conv2 = nn.Conv2d(96, 256, 5, padding=2)
self.maxpool2 = nn.MaxPool2d(3, 2)
self.conv3 = nn.Conv2d(256, 384, 3, padding=1)
self.conv4 = nn.Conv2d(384, 384, 3, padding=1)
self.conv5 = nn.Conv2d(384, 256, 3, padding=1)
self.flatten = nn.Flatten(1)
self.linear1 = nn.Linear(6400, 4096)
self.linear2 = nn.Linear(4096, 4096)
self.linear3 = nn.Linear(4096, 10)
self.relu = nn.ReLU()
self.dropout = nn.Dropout()
def forward(self, input):
h1 = self.relu(self.conv1(input))
h1 = self.maxpool1(h1)
h2 = self.relu(self.conv2(h1))
h2 = self.maxpool2(h2)
h3 = self.relu(self.conv3(h2))
h4 = self.relu(self.conv4(h3))
h5 = self.relu(self.conv5(h4))
h5 = self.maxpool1(h5)
h5 = self.flatten(h5)
h6 = self.relu(self.linear1(h5))
h6 = self.dropout(h6)
h7 = self.relu(self.linear2(h6))
h7 = self.dropout(h7)
h8 = self.linear3(h7)
return h8
alexnet = AlexNet()
alexnet = alexnet.to(device)
celoss = torch.nn.CrossEntropyLoss()
optimer = torch.optim.Adam(alexnet.parameters(), lr=lr)
train_loss_all = []
test_loss_all = []
train_acc = []
test_acc = []
for epoch in range(epochs):
test_loss = 0.0
train_loss = 0.0
right = 0.0
right_num = 0.0
for inputs, labels in tqdm(train_dataloader):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = alexnet(inputs)
loss = celoss(outputs, labels)
train_loss += loss.detach().cpu().numpy()
optimer.zero_grad()
loss.backward()
optimer.step()
right = outputs.argmax(dim=1) == labels
right_num += right.sum().detach().cpu().numpy()
train_loss_all.append(train_loss / float(len(train_dataloader)))
train_acc.append(right_num / len(train_dataset))
with torch.no_grad():
right = 0.0
right_num = 0.0
for inputs, labels in tqdm(test_dataloader):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = alexnet(inputs)
loss = celoss(outputs, labels)
test_loss += loss.detach().cpu().numpy()
right = outputs.argmax(dim=1) == labels
right_num += right.sum().detach().cpu().numpy()
test_loss_all.append(test_loss / float(len(test_dataloader)))
test_acc.append(right_num / len(test_dataset))
print(f'eopch: {epoch + 1}, train_loss: {train_loss / len(train_dataloader)}, test_loss: {test_loss / len(test_dataloader) }, acc: {right_num / len(test_dataset) * 100}%')
x = range(1, epochs + 1)
plt.plot(x, train_loss_all, label = 'train_loss', linestyle='--')
plt.plot(x, test_loss_all, label = 'test_loss', linestyle='--')
plt.plot(x, train_acc, label = 'train_acc', linestyle='--')
plt.plot(x, test_acc, label = 'test_acc', linestyle='--')
plt.legend()
plt.show()
net = AlexNet()
print(summary(net, (1, 1, 224, 224)))
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
AlexNet [1, 10] --
├─Conv2d: 1-1 [1, 96, 54, 54] 11,712
├─ReLU: 1-2 [1, 96, 54, 54] --
├─MaxPool2d: 1-3 [1, 96, 26, 26] --
├─Conv2d: 1-4 [1, 256, 26, 26] 614,656
├─ReLU: 1-5 [1, 256, 26, 26] --
├─MaxPool2d: 1-6 [1, 256, 12, 12] --
├─Conv2d: 1-7 [1, 384, 12, 12] 885,120
├─ReLU: 1-8 [1, 384, 12, 12] --
├─Conv2d: 1-9 [1, 384, 12, 12] 1,327,488
├─ReLU: 1-10 [1, 384, 12, 12] --
├─Conv2d: 1-11 [1, 256, 12, 12] 884,992
├─ReLU: 1-12 [1, 256, 12, 12] --
├─MaxPool2d: 1-13 [1, 256, 5, 5] --
├─Flatten: 1-14 [1, 6400] --
├─Linear: 1-15 [1, 4096] 26,218,496
├─ReLU: 1-16 [1, 4096] --
├─Linear: 1-17 [1, 4096] 16,781,312
├─ReLU: 1-18 [1, 4096] --
├─Dropout: 1-19 [1, 4096] --
├─Linear: 1-20 [1, 10] 40,970
==========================================================================================
Total params: 46,764,746
Trainable params: 46,764,746
Non-trainable params: 0
Total mult-adds (M): 938.75
==========================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 4.87
Params size (MB): 187.06
Estimated Total Size (MB): 192.13
==========================================================================================
训练结果
在colab上用T4这块GPU,跑了10代的训练结果,可以看到:大力出奇迹!
eopch: 1, train_loss: 0.7214684476243689, test_loss: 0.3966940574347973, acc: 85.71%
eopch: 2, train_loss: 0.3377066216570266, test_loss: 0.33194345571100714, acc: 88.34%
eopch: 3, train_loss: 0.2881323180934216, test_loss: 0.295409569516778, acc: 89.06%
eopch: 4, train_loss: 0.25911708556591195, test_loss: 0.2705956816673279, acc: 90.23%
eopch: 5, train_loss: 0.23344570303216894, test_loss: 0.2671656012535095, acc: 90.08%
eopch: 6, train_loss: 0.21801440338505076, test_loss: 0.27004530634731055, acc: 90.44%
eopch: 7, train_loss: 0.20689800283376206, test_loss: 0.25210654716938735, acc: 91.04%
eopch: 8, train_loss: 0.19044599720138183, test_loss: 0.2566268537193537, acc: 91.26%
eopch: 9, train_loss: 0.17452049604121675, test_loss: 0.26449268609285354, acc: 91.26%
eopch: 10, train_loss: 0.16033657827275866, test_loss: 0.2467486930079758, acc: 91.75%
结果可视化: