1 Intorduction
2012年,AlexNet横空出世。
AlexNet使用了8层卷积神经网络,并以很大的优势赢得了ImageNet 2012图像识别挑战赛。它首次证明了学习到的特征可以超越手工设计的特征,从而一举打破计算机视觉研究的前状。
AlexNet与LeNet的设计理念非常相似,但也有显著的区别:
第一,与相对较小的LeNet相比,AlexNet包含8层变换,其中有5层卷积和2层全连接隐藏层,以及1个全连接输出层。
第二,AlexNet将sigmoid激活函数改成了更加简单的ReLU激活函数。、这是由于当sigmoid激活函数输出极接近0或1时,这些区域的梯度几乎为0,从而造成反向传播无法继续更新部分模型参数;而ReLU激活函数在正区间的梯度恒为1。因此,若模型参数初始化不当,sigmoid函数可能在正区间得到几乎为0的梯度,从而令模型无法得到有效训练。
第三,AlexNet通过Dropout来控制全连接层的模型复杂度,而LeNet并没有使用Dropout。
第四,AlexNet引入了大量的图像增广,如翻转、裁剪和颜色变化,从而进一步扩大数据集来缓解过拟合。
2 AlexNet网络结构
import time
import torch
from torch import nn, optim
import torchvision
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
device = torch.device('cuda' if torch.cuda.is_available else 'cpu')
class AlexNet(nn.Module):
def __init__(self):
super(AlexNet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 96, 11, 4),
nn.ReLU(),
nn.MaxPool2d(3, padding=0, stride=2),
nn.Conv2d(96, 256, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(3, padding=0, stride=2),
# 连续3个卷积层,且使用更小的卷积窗口。除了最后的卷积层外,进一步增大了输出通道数。
# 前两个卷积层后不使用池化层来减小输入的高和宽
nn.Conv2d(256, 384, 3, 1, 1),
nn.ReLU(),
nn.Conv2d(384, 384, 3, 1, 1),
nn.ReLU(),
nn.Conv2d(384, 256, 3, 1, 1),
nn.ReLU(),
nn.MaxPool2d(3, padding=0, stride=2)
)
self.fc = nn.Sequential(
nn.Linear(256*5*5, 4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 10),
)
def forward(self,img):
feature = self.conv(img)
output = self.fc(feature.view(img.shape[0], -1))
return output
net = AlexNet()
print(net)
AlexNet(
(conv): Sequential(
(0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))
(1): ReLU()
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU()
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU()
(8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU()
(10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU()
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc): Sequential(
(0): Linear(in_features=6400, out_features=4096, bias=True)
(1): ReLU()
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU()
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=10, bias=True)
)
)
3 预处理数据
def load_data_fashion_mnist(batch_size, resize = None, root='~/Datasets/FashionMNIST'):
trans=[]
if resize:
trans.append(torchvision.transforms.Resize(size = resize))
trans.append(torchvision.transforms.ToTensor())
transform = torchvision.transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST(root = root, train = True, download = True, transform = transform)
mnist_test = torchvision.datasets.FashionMNIST(root = root, train = False, download = True, transform = transform)
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size = batch_size, shuffle = True, num_workers = 4)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size = batch_size, shuffle = True, num_workers = 4)
return train_iter, test_iter
batch_size = 128
train_iter, test_iter = load_data_fashion_mnist(batch_size, resize = 224, root='~/Datasets/FashionMNIST')
4 训练模型
lr , num_epochs =0.001, 5
optimizer = torch.optim.Adam(net.parameters(),lr= lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)
training on cuda
epoch 1, loss 0.5860, train acc 0.778, test acc 0.865, time 360.2 sec
epoch 2, loss 0.1674, train acc 0.877, test acc 0.887, time 359.3 sec
epoch 3, loss 0.0953, train acc 0.894, test acc 0.896, time 372.7 sec
epoch 4, loss 0.0637, train acc 0.907, test acc 0.904, time 381.4 sec
epoch 5, loss 0.0464, train acc 0.914, test acc 0.901, time 380.4 sec
Hints:
AlexNet跟LeNet结构类似,但使用了更多的卷积层和更大的参数空间来拟合大规模数据集ImageNet。
它是浅层神经网络和深度神经网络的分界线,AlexNet这个观念上的转变和真正优秀实验结果的产生令学术界付出了很多年。
欢迎关注【OAOA】