AlexNet网络结构详解与代码复现

放风筝的猪

已于 2023-01-13 15:32:25 修改

阅读量923

点赞数 2

分类专栏：卷积神经网络文章标签：深度学习神经网络 Powered by 金山文档

于 2023-01-08 11:42:22 首次发布

本文链接：https://blog.csdn.net/weixin_45897172/article/details/128592493

版权

卷积神经网络专栏收录该内容

10 篇文章 5 订阅

订阅专栏

参考内容来自up：3.1 AlexNet网络结构详解与花分类数据集下载_哔哩哔哩_bilibili

up主的CSDN博客：太阳花的小绿豆的博客_CSDN博客-深度学习,软件安装,Tensorflow领域博主

一、简介

AlexNet是2012年ISLVRC 2012（ImageNet Large Scale Visual Recognition Challenge）竞赛的冠军网络，分类准确率由传统的 70%+提升到 80%+。它是由Hinton和他的学生Alex Krizhevsky设计的。也是在那年之后，深度学习开始迅速发展。

数据集：ISLVRC 2012 ，是ImageNet的子集

训练集：1,281,167张已标注图片

验证集：50,000张已标注图片

测试集：100,000张未标注图片

该网络的亮点在于：

（1）首次利用 GPU 进行网络加速训练。

（2）使用了 ReLU 激活函数，而不是传统的 Sigmoid 激活函数以及 Tanh 激活函数。

（3）使用了 LRN 局部响应归一化。

（4）在全连接层的前两层中使用了 Dropout 随机失活神经元操作，以减少过拟合。

二、详解

1.过拟合：

根本原因是特征维度过多，模型假设过于复杂，参数过多，训练数据过少，噪声过多，导致拟合的函数完美的预测训练集，但对新数据的测试集预测结果差。过度的拟合了训练数据，而没有考虑到泛化能力。

2. Dropout 解决过拟合

使用 Dropout 的方式在网络正向传播过程中随机失活一部分神经元，一般放在全连接层之间

3. 特征矩阵大小计算公式：

因为作者使用两块gpu进行计算，上下两部分是一摸一样的，现在看下面的一块：

Conv1:

kernels:48*2=96 input_size: [224, 224, 3]

kernel_size:11 output_size: [55, 55, 96]

padding:[1, 2]

stride:4

计算：N = (W − F + 2P ) / S + 1 = [224-11+(1+2)] / 4+1

Conv1: Maxpool1:

kernels:48*2=96 kernel_size:3 input_size: [55, 55, 96]

kernel_size:11 pading: 0 output_size: [27, 27, 96]

padding: [1, 2] stride:2

stride:4

output_size: [55, 55, 96]

计算：N = (W − F + 2P ) / S + 1 = (55-3) / 2+1

Conv2:

kernels:128*2=256 input_size: [27, 27, 96]

kernel_size:5 output_size: [27, 27, 25]

padding: [2, 2]

stride:1

计算：N = (W − F + 2P ) / S + 1 = (27-5+4 )/ 1+1

Maxpool2:

kernel_size:3 input_size: [27, 27, 256]

pading: 0 output_size: [13, 13, 256]

stride:2

计算：N = (W − F + 2P ) / S + 1 = (27-3) / 2+1

Conv3:

kernels:192*2=384 input_size: [13, 13, 256]

kernel_size:3 output_size: [13, 13, 3]

padding: [1, 1]

stride:1

计算：N = (W − F + 2P ) / S + 1 = (13-3+2 ) / 1+1

Conv4:

kernels:192*2=384 input_size: [13, 13, 384]

kernel_size:3 output_size: [13, 13, 384]

padding: [1, 1]

stride: 1

计算：N = (W − F + 2P ) / S + 1 = (13-3+2 ) / 1+1

Conv5:

kernels:128*2=256 input_size: [13, 13, 384]

kernel_size:3 output_size: [13, 13, 256]

padding: [1, 1]

stride:1

计算：N = (W − F + 2P ) / S + 1 = (13-3+2) / 1+1

计算：N = (W − F + 2P ) / S + 1 = (13-3) / 2+1

最后一层全连接层，因为数据有一千个类别，所以最后为1000，将这个网络应用到自己的数据集中，可以将这个数字改为自己需要的个数。

4.数据集下载

https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz

将数据集进行划分，分为训练集和测试集

在当前目录打开Powershell按下方运行

三、代码复现

使用pytorch搭建AlexNet并训练花分类数据集

1. model.py

import torch.nn as nn
import torch


class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(                              #将一系列的层结构打包，形成一个新的结构，取名features，用于专门提取图像特征，对比之前LeNet可以精简一些代码
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),  # input[3, 224, 224]  output[48, 55, 55]，这里使用48个卷积核，和原文中的96个卷积核正确率相差不大
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[48, 27, 27]
            nn.Conv2d(48, 128, kernel_size=5, padding=2),           # output[128, 27, 27]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 13, 13]
            nn.Conv2d(128, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, padding=1),          # output[128, 13, 13]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 6, 6]
        )

        self.classifier = nn.Sequential(                            #包含最后面的3层全连接层，是一个分类器，将全连接层打包为新的结构
            nn.Dropout(p=0.5),                                      #p为失活的比例，默认为0.5
            nn.Linear(128 * 6 * 6, 2048),                           #因为这里搭建网络时使用原文一半的参数，所以这里为128，节点个数为2048

            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, num_classes),                           #输出为数据集类别的个数，在初始化时传入的
        )

        if init_weights:                                            #初始化权重，当初始化时设置为true，就会使用这个函数
            self._initialize_weights()                              #在当前版本pytroch在卷积层和全连接层中自动使用凯明初始化方法

    
    def forward(self, x):                                           #正向传播
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)                           #展平处理，从channel维度进行展平
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():                                    #遍历self.modules模块，继承自nn.Module，会遍历我们定义的每一个层结构
            if isinstance(m, nn.Conv2d):                            #isinstance函数用来比较得到的层结构是否等于给定的类型，是卷积层时，则使用凯明初始化
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):                          #如果传进来的是全连接层
                nn.init.normal_(m.weight, 0, 0.01)                  #通过正态分布对权重赋值，均值为0，方差为0.01
                nn.init.constant_(m.bias, 0)                        #偏置为0

注：

Sequential 将一系列的层结构打包，形成一个新的结构，取名features，用于专门提取图像特征，对比之前LeNet可以精简一些代码；

第一个卷积层使用48个卷积核，和原文中的96个卷积核正确率相差不大，提高速度；

padding=1表示在上下左右各补一行0，padding=[1，2] 表示上下补1行0，左右各补两行0；

使用nn.ZeroPad2d((1，2，1，2)) 可以左侧补一列，右侧补两列，上方补一列，下方补两列

如果在本网络中这样使用，则在Conv1中求得的output为55.25，但在pytroch中会将小数舍弃掉，也就是说会变成55，会把余数对应的行和列舍弃掉

参考：pytorch中的卷积操作详解_太阳花的小绿豆的博客-CSDN博客_pytorch 卷积

激活函数中inplace可以理解为pytroch 增加计算量同时降低内存使用的一种方法；

当stride 为1时可以不设置，默认为1；

2. train.py

导入包

import os
import sys
import json
import torch
import torch.nn as nn
from torchvision import transforms, datasets, utils
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
from tqdm import tqdm
from model import AlexNet

使用GPU进行训练的代码

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

训练过程

def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),        #当key为train时，返回训练集使用的一系列预处理方法，随机裁剪到224*224大小
                                     transforms.RandomHorizontalFlip(),        #水平方向随机反转
                                     transforms.ToTensor(),                    #转化为ToTensor
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),     #标准化处理
        "val": transforms.Compose([transforms.Resize((224, 224)),  # cannot 224, must (224, 224)
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "../"))  # os.getcwd()获取当前文件所在的目录，两个点表示返回上一层目录，os.path.join将后面两个路径连起来
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),      #加载数据集，训练集
                                         transform=data_transform["train"])           #数据预处理
    train_num = len(train_dataset)                                                    #通过len函数打印训练集有多少张图片

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx        #获取每一种类别对应的索引
    cla_dict = dict((val, key) for key, val in flower_list.items())     #遍历刚刚获得的字典，将key和val反过来
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)      # 将cla_dict编码为json格式
    with open('class_indices.json', 'w') as json_file:      #方便预测时读取信息
        json_file.write(json_str)

    batch_size = 4
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,       #加载数据
                                               batch_size=batch_size, 
                                               shuffle=True,    #随机数据
                                               num_workers=nw)      #线程个数，windows为0

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),       #载入测试集
                                            transform=data_transform["val"])        #预处理函数
    val_num = len(validate_dataset)     #统计测试集的文件个数
    validate_loader = torch.utils.data.DataLoader(validate_dataset,     #加载数据
                                                  batch_size=4, shuffle=False,
                                                  num_workers=nw)

    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))
    
    
    net = AlexNet(num_classes=5, init_weights=True)

    net.to(device)      #放到GPU上
    loss_function = nn.CrossEntropyLoss()       #损失函数
    # pata = list(net.parameters())
    optimizer = optim.Adam(net.parameters(), lr=0.0002)     #优化器，优化对象时网络中所有的可训练参数

    epochs = 10
    save_path = './AlexNet.pth'
    best_acc = 0.0          #最佳准确率
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()     #只希望在训练过程中随机失活参数，所以通过net.train()和net.eval()管理dropout方法，这样还可以管理BN层
        running_loss = 0.0      #统计训练过程中的平均损失
        train_bar = tqdm(train_loader, file=sys.stdout)
        for step, data in enumerate(train_bar):     #遍历数据集
            images, labels = data       #分为图像和标签
            optimizer.zero_grad()       #清空之前的梯度信息
            outputs = net(images.to(device))        #正向传播，指定设备
            loss = loss_function(outputs, labels.to(device))        #计算预测值和真实值的损失
            loss.backward()     #反向传播到每一个节点中
            optimizer.step()        #更新每一个节点的参数

            # print statistics
            running_loss += loss.item()     #累加loss

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():       #禁止pytroch对参数进行跟踪，在验证时不会计算损失梯度
            val_bar = tqdm(validate_loader, file=sys.stdout)
            for val_data in val_bar:        #遍历验证集
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                predict_y = torch.max(outputs, dim=1)[1]        #输出最大值设置为预测值
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()      #预测标签和真实标签对比，计算预测正确的个数

        val_accurate = acc / val_num        #测试集的准确率
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:     #当前的大于历史最优的
            best_acc = val_accurate     #赋值
            torch.save(net.state_dict(), save_path)     #保存权重

    print('Finished Training')


if __name__ == '__main__':
    main()

运行结果

每个类别只使用十张图片加快速度

3. predict.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import AlexNet


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize((224, 224)),
         transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # load image
    img_path = "../tulip.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)

    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)       #预处理时自动将channel换到第一个维度
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)       #添加batch维度

    # read class_indict
    json_path = './class_indices.json'      #读取类别文件
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    with open(json_path, "r") as f:
        class_indict = json.load(f)     #解码成需要的字典

    # create model
    model = AlexNet(num_classes=5).to(device)       #初始化

    # load model weights
    weights_path = "./AlexNet.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path))     #载入网络模型

    model.eval()        #进入eval，关闭掉dropout方法
    with torch.no_grad():       #让pytroch不去跟踪损失梯度
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()     #正向传播，通过squeeze将batch维度压缩掉
        predict = torch.softmax(output, dim=0)      #通过softmax将输出变为概率分布
        predict_cla = torch.argmax(predict).numpy()     #通过argmax获得概率最大处的索引值

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()

预测结果：

由于数据量少，所以准确率很低。