pytorch图像分类之一:AlexNet

本文介绍了深度学习在图像识别领域的进步,特别是AlexNet如何革新特征学习。通过Kaggle的猫狗数据集,利用PyTorch实现了一个简化版的AlexNet模型进行训练,最终在验证集上达到约86%的准确率。文章涵盖了模型结构、源码解析以及训练过程,展示了深度学习在图像分类任务中的应用。
摘要由CSDN通过智能技术生成

摘要:文章部分摘自链接。以kaggle上的猫狗数据集做训练,20000张训练图片,猫狗各10000张,5000张验证集,猫狗各2500张。

数据集链接
链接:https://pan.baidu.com/s/1uTl_ErqP_KxYH4M5feZOaQ
提取码:6666

1.学习表征

在2012年前,图像特征都是机械地计算出来的。事实上,设计一套新的特征函数、改进结果,并撰写论文是盛极一时的潮流。SIFT [Lowe, 2004]、SURF [Bay et al., 2006]、HOG(定向梯度直方图) [Dalal & Triggs, 2005] 、bags of visual words 和类似的特征提取方法占据了主导地位。

另一组研究人员,包括Yann LeCun、Geoff Hinton、Yoshua Bengio、Andrew Ng、Shun ichi Amari和Juergen Schmidhuber,想法则与众不同:他们认为特征本身应该被学习。此外,他们还认为,在合理地复杂性前提下,特征应该由多个共同学习的神经网络层组成,每个层都有可学习的参数。在机器视觉中,最底层可能检测边缘、颜色和纹理。事实上,Alex Krizhevsky、Ilya Sutskever和Geoff Hinton提出了一种新的卷积神经网络变体AlexNet。在2012年ImageNet挑战赛中取得了轰动一时的成绩。AlexNet 以 Alex Krizhevsky 的名字命名,他是论文 [Krizhevsky et al., 2012] 的第一作者。

有趣的是,在网络的最底层,模型学习到了一些类似于传统滤波器的特征抽取器。 图1 是从AlexNet论文 [Krizhevsky et al., 2012] 复制的,描述了底层图像特征。
AlexNet第一层学习到的特征抽取器
AlexNet的更高层建立在这些底层表示的基础上,以表示更大的特征,如眼睛、鼻子、草叶等等。而更高的层可以检测整个物体,如人、飞机、狗或飞盘。最终的隐藏神经元可以学习图像的综合表示,从而使属于不同类别的数据易于区分。尽管一直有一群执着的研究者不断钻研,试图学习视觉数据的逐级表征,然而很长一段时间里这些尝试都未有突破。深度卷积神经网络的突破出现在2012年。2012年,AlexNet横空出世。它首次证明了学习到的特征可以超越手工设计的特征。它一举打破了计算机视觉研究的现状。 AlexNet使用了8层卷积神经网络,并以很大的优势赢得了2012年ImageNet图像识别挑战赛。

2. 网络模型

AlexNet和LeNet的设计理念非常相似,但也存在显著差异。 首先,AlexNet比相对较小的LeNet5要深得多。 AlexNet由八层组成:五个卷积层、两个全连接隐藏层和一个全连接输出层。 其次,AlexNet使用ReLU而不是sigmoid作为其激活函数。
在这里插入图片描述

3.源码部分

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset,DataLoader
import math
import numpy as np
from PIL import Image
import os
import torchvision
import matplotlib.pyplot as plt
from torch.utils.tensorboard import SummaryWriter


def AlexNet():
    net = nn.Sequential(
        #11x11卷积层
        nn.Conv2d(in_channels=3,out_channels=96,kernel_size=11,stride=4,padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3,stride=2),
        #5x5卷积层
        nn.Conv2d(in_channels=96,out_channels=256,kernel_size=5,padding=2),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3,stride=2),
        #连续三个3x3卷积层
        nn.Conv2d(in_channels=256,out_channels=384,kernel_size=3,padding=1),nn.ReLU(),
        nn.Conv2d(in_channels=384,out_channels=384,kernel_size=3,padding=1),nn.ReLU(),
        nn.Conv2d(in_channels=384,out_channels=256,kernel_size=3,padding=1),nn.ReLU(),
        nn.MaxPool2d(kernel_size=3,stride=2),
        #展开
        nn.Flatten(),
        #全连接层,这里的数目与原来论文中不一致,这里减少一点参数方便训练
        nn.Linear(6400,512),nn.ReLU(),nn.Dropout(p=0.5),
        nn.Linear(512,64),nn.ReLU(),nn.Dropout(p=0.5),
        nn.Linear(64,2)
    )
    
    return net

class CatsAndDogs(Dataset):
    def __init__(self, root,transforms=None,size=(224,224)):
        #初始化

        self.images = [os.path.join(root,item) for item in os.listdir(root)]
        self.transforms = transforms
        self.size = size

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        #这里需要resize是因为用Dataloader加载的同一个batch里面的图片大小需要一样
        image = Image.open(self.images[idx])
        image = self.transforms(image)
        #the format of the path :"K:\\imageData\\dogAndCat\\train\\dog.9983.jpg"
        label = self.images[idx].split("\\")[-1].split(".")[0]
        if label == "cat":
            label = 0
        if label == "dog":
            label = 1
        return image,label



def train(model,optimizer,loss_fn,train_loader,validLoader,epoches=30,device=torch.device("cpu"),logdir="./log"):
    train_batches = 0
    train_loss_list = []
    valid_loss_list = []
    valid_accuracy_list = []
    epoch_list = []
    writer = SummaryWriter(logdir)
    for epoch in range(epoches):
        training_loss = 0.0
        valid_loss = 0.0
        model.train()
        for batch in train_loader:
            train_batches += 1
#             if train_batches>5:
#                 train_batches=0
#                 break
            optimizer.zero_grad()
            inputs,targets=batch
            inputs = inputs.to(device)
            targets = targets.to(device)
            outputs = model(inputs)
            loss = loss_fn(outputs,targets)
            loss.backward()
            optimizer.step()
            training_loss += loss.data.item()*inputs.size(0)
            #print("training batch: {}, batch loss: {:.5f}".format(train_batches,loss.data.item()))
            writer.add_scalar("loss/batch_loss",loss.data.item(),train_batches)
        training_loss /= len(train_loader.dataset)


        model.eval()
        num_correct = 0
        num_examples = 0
        with torch.no_grad():
            for batch in validLoader:
    #            train_batches += 1# 正常情况需注释
    #             if train_batches > 5:
    #                 train_batches = 0
    #                 break
                inputs, targets = batch
                inputs = inputs.to(device)
                outputs = model(inputs)
                targets = targets.to(device)
                loss = loss_fn(outputs, targets)
                valid_loss += loss.data.item() * inputs.size(0)
                correct = torch.eq(torch.max(F.softmax(outputs, dim=1), dim=1)[1], targets)
                num_correct += torch.sum(correct).item()
                num_examples += correct.shape[0]
            valid_loss /= len(validLoader.dataset)
            valid_accuracy = num_correct / num_examples
            if epoch % 1 == 0:
                print(
                    'Epoch: {}/{}, Training Loss: {:.5f}, Validation Loss: {:.5f}, accuracy = {:.5f}'\
                        .format(epoch, epoches,training_loss,valid_loss,num_correct / num_examples))
                writer.add_scalar("loss/epoches_loss",training_loss,epoch)
                writer.add_scalar("loss/accuracy",num_correct / num_examples,epoch)
                acc = num_correct / num_examples
                writer.add_scalars("loss/train_valid",{"trainLoss":training_loss,"accuracy":acc},epoch)
        train_loss_list.append(training_loss)
        valid_loss_list.append(valid_loss)
        valid_accuracy_list.append(valid_accuracy)
        epoch_list.append(epoch)
    return train_loss_list, valid_loss_list, valid_accuracy_list, epoch_list


def get_parameter_number(net):
    total_num = sum(p.numel() for p in net.parameters())
    trainable_num = sum(p.numel() for p in net.parameters() if p.requires_grad)
    return {'Total parmeters': total_num, 'Trainable parmeters': trainable_num}


def visualize(train_loss,val_loss,val_acc,path="./train_valid.png"):
    train_loss = np.array(train_loss)
    val_loss = np.array(val_loss)
    val_acc = np.array(val_acc)
    plt.grid(True)
    plt.xlabel("epoch")
    plt.ylabel("value")
    plt.title("train_loss and valid_acc")
    plt.plot(np.arange(len(val_acc)),val_acc, label=r"valid_acc",c="g")
    plt.plot(np.arange(len(train_loss)),train_loss,label=r"train_loss",c="r")
    plt.legend()#显示曲线标签
    plt.savefig(path)
    plt.show()


测试模型是否正确

net = AlexNet()
x = torch.randn(1,3,224,224)
for layer in net:
    x = layer(x)
    print(layer.__class__.__name__, 'Output shape:\t', x.shape)

Conv2d Output shape:	 torch.Size([1, 96, 54, 54])
ReLU Output shape:	 torch.Size([1, 96, 54, 54])
MaxPool2d Output shape:	 torch.Size([1, 96, 26, 26])
Conv2d Output shape:	 torch.Size([1, 256, 26, 26])
ReLU Output shape:	 torch.Size([1, 256, 26, 26])
MaxPool2d Output shape:	 torch.Size([1, 256, 12, 12])
Conv2d Output shape:	 torch.Size([1, 384, 12, 12])
ReLU Output shape:	 torch.Size([1, 384, 12, 12])
Conv2d Output shape:	 torch.Size([1, 384, 12, 12])
ReLU Output shape:	 torch.Size([1, 384, 12, 12])
Conv2d Output shape:	 torch.Size([1, 256, 12, 12])
ReLU Output shape:	 torch.Size([1, 256, 12, 12])
MaxPool2d Output shape:	 torch.Size([1, 256, 5, 5])
Flatten Output shape:	 torch.Size([1, 6400])
Linear Output shape:	 torch.Size([1, 512])
ReLU Output shape:	 torch.Size([1, 512])
Dropout Output shape:	 torch.Size([1, 512])
Linear Output shape:	 torch.Size([1, 64])
ReLU Output shape:	 torch.Size([1, 64])
Dropout Output shape:	 torch.Size([1, 64])
Linear Output shape:	 torch.Size([1, 2])
net
Sequential(
  (0): Conv2d(3, 96, kernel_size=(11, 11), stride=(4, 4), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (4): ReLU()
  (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU()
  (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): ReLU()
  (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU()
  (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (13): Flatten()
  (14): Linear(in_features=6400, out_features=512, bias=True)
  (15): ReLU()
  (16): Dropout(p=0.5, inplace=False)
  (17): Linear(in_features=512, out_features=64, bias=True)
  (18): ReLU()
  (19): Dropout(p=0.5, inplace=False)
  (20): Linear(in_features=64, out_features=2, bias=True)
)
if __name__ == "__main__":
    epoches = 25
    modelPath = "D:\\classifier\\model\\dogsAndCats_AlexNet.pt"
    trainingResultPath = "D:\\classifier\\model\\dogsAndCats_AlexNet.png"
    logdir = "./catsAndDogs_AlexNet/log"#tensorboard logdir
   
    model = AlexNet()
    
    img_transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize((224, 224)),
        torchvision.transforms.ToTensor(),
    ])

    trainset = CatsAndDogs(r"D:\classifier\imageData\catsAndDogs\train",transforms=img_transforms)
    validset = CatsAndDogs(r"D:\classifier\imageData\catsAndDogs\val",transforms=img_transforms)
    trainLoader = DataLoader(trainset, batch_size=128, shuffle=True,num_workers=0)
    validLoader = DataLoader(validset, batch_size=128, shuffle=True,num_workers=0)

    if torch.cuda.is_available():
        device = torch.device("cuda")
        print("run in cuda")
    else:
        device = torch.device("cpu")
        print("run in cpu")

    model.to(device)
    
    optimizer = torch.optim.AdamW(model.parameters(),lr=0.0005)
    loss_fn = torch.nn.CrossEntropyLoss()
    
    print(get_parameter_number(model))

    train_loss_list,valid_loss_list,valid_accuracy_list ,epoch_list = \
        train(model,optimizer,loss_fn,trainLoader,validLoader,epoches,device,logdir)
    torch.save(model,modelPath)
    visualize(train_loss_list,valid_loss_list,valid_accuracy_list,trainingResultPath)
run in cuda
{'Total parmeters': 7057474, 'Trainable parmeters': 7057474}
Epoch: 0/25, Training Loss: 0.68914, Validation Loss: 0.67966, accuracy = 0.61220
Epoch: 1/25, Training Loss: 0.69001, Validation Loss: 0.68665, accuracy = 0.50000
Epoch: 2/25, Training Loss: 0.68481, Validation Loss: 0.67704, accuracy = 0.55940
Epoch: 3/25, Training Loss: 0.67508, Validation Loss: 0.66865, accuracy = 0.57240
Epoch: 4/25, Training Loss: 0.64222, Validation Loss: 0.60939, accuracy = 0.67520
Epoch: 5/25, Training Loss: 0.59012, Validation Loss: 0.53298, accuracy = 0.73220
Epoch: 6/25, Training Loss: 0.51666, Validation Loss: 0.47126, accuracy = 0.77320
Epoch: 7/25, Training Loss: 0.44834, Validation Loss: 0.41834, accuracy = 0.80760
Epoch: 8/25, Training Loss: 0.40024, Validation Loss: 0.38949, accuracy = 0.82140
Epoch: 9/25, Training Loss: 0.35366, Validation Loss: 0.46253, accuracy = 0.78220
Epoch: 10/25, Training Loss: 0.31696, Validation Loss: 0.37098, accuracy = 0.83320
Epoch: 11/25, Training Loss: 0.27756, Validation Loss: 0.32717, accuracy = 0.85800
Epoch: 12/25, Training Loss: 0.24526, Validation Loss: 0.35002, accuracy = 0.85100
Epoch: 13/25, Training Loss: 0.20707, Validation Loss: 0.39739, accuracy = 0.83860
Epoch: 14/25, Training Loss: 0.17929, Validation Loss: 0.37975, accuracy = 0.85600
Epoch: 15/25, Training Loss: 0.14151, Validation Loss: 0.39280, accuracy = 0.86200
Epoch: 16/25, Training Loss: 0.12865, Validation Loss: 0.51913, accuracy = 0.85640
Epoch: 17/25, Training Loss: 0.12045, Validation Loss: 0.44457, accuracy = 0.86560
Epoch: 18/25, Training Loss: 0.08484, Validation Loss: 0.46240, accuracy = 0.86580
Epoch: 19/25, Training Loss: 0.05874, Validation Loss: 0.50794, accuracy = 0.86540
Epoch: 20/25, Training Loss: 0.05012, Validation Loss: 0.58512, accuracy = 0.86980
Epoch: 21/25, Training Loss: 0.06486, Validation Loss: 0.55290, accuracy = 0.87020
Epoch: 22/25, Training Loss: 0.04856, Validation Loss: 0.62414, accuracy = 0.86820
Epoch: 23/25, Training Loss: 0.04241, Validation Loss: 0.57700, accuracy = 0.86040
Epoch: 24/25, Training Loss: 0.03803, Validation Loss: 0.73861, accuracy = 0.86560

训练损失和验证准确率

  • tensorboard可视化训练过程
    在这里插入图片描述
  • 总结

把原来ALexNet最后的全连接层的神经元数目减少了一些方便训练,最后的验证准确度差不多86%左右,训练参数7057474个。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值