深度学习100例 | 第1例:猫狗识别 - PyTorch实现

猫狗识别

阅读本文你将学会如何使用 PyTorch 进行图像识别

我的环境:

  • 语言环境:Python3.8
  • 编译器:Jupyter lab
  • 深度学习环境:
    • torch==1.10.0+cu113
    • torchvision==0.11.1+cu113
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import numpy as np

数据读取与预处理

在对数据进行增强时注意是否合理哦!原本我使用了下面的代码对数据进行增强,试图解决数据不足的问题,经过测试我发现并非所有的增强操作都会产生正面影响。

对此,我做过几个对比小实验,保持其他参数不变,仅改变数据增强方式,最后实验结果如下:

  • 不进行数据增强:79.2%
  • 随机旋转:80.8%
  • 随机旋转+高斯模糊模糊:83.3%
  • 随机垂直翻转:73.3%

在《深度学习100例》后期的文章中我再进行更加详细的对比,这次算是先让大家了解一下。

关于transforms.Compose的详细介绍可以参考文章 https://blog.csdn.net/qq_38251616/article/details/124878863

train_datadir = './1-cat-dog/train/'
test_datadir  = './1-cat-dog/val/'

train_transforms = transforms.Compose([
    transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
    # transforms.RandomRotation(degrees=(-10, 10)),  #随机旋转,-10到10度之间随机选
    # transforms.RandomHorizontalFlip(p=0.5),  #随机水平翻转 选择一个概率概率
    # transforms.RandomVerticalFlip(p=0.5),  #随机垂直翻转
    # transforms.RandomPerspective(distortion_scale=0.6, p=1.0), # 随机视角
    # transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5)),  #随机选择的高斯模糊模糊图像
    transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor,并归一化到[0,1]之间
    transforms.Normalize(           # 标准化处理-->转换为标准正太分布(高斯分布),使模型更容易收敛
        mean=[0.485, 0.456, 0.406], 
        std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
])

test_transforms = transforms.Compose([
    transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
    transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor,并归一化到[0,1]之间
    transforms.Normalize(           # 标准化处理-->转换为标准正太分布(高斯分布),使模型更容易收敛
        mean=[0.485, 0.456, 0.406], 
        std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
])

train_data = datasets.ImageFolder(train_datadir,transform=train_transforms)

test_data  = datasets.ImageFolder(test_datadir,transform=test_transforms)

train_loader = torch.utils.data.DataLoader(train_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=1)
test_loader  = torch.utils.data.DataLoader(test_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=1)

关于 transforms.Compose 这部分更多的信息可以参考 https://pytorch-cn.readthedocs.io/zh/latest/torchvision/torchvision-transform/

如果你想知道还有哪些数据增强手段,可以看看这里:https://pytorch.org/vision/stable/auto_examples/plot_transforms.html#sphx-glr-auto-examples-plot-transforms-py 对应的API,你可以在这里找到 https://pytorch.org/vision/stable/transforms.html

for X, y in test_loader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break
Shape of X [N, C, H, W]:  torch.Size([4, 3, 224, 224])
Shape of y:  torch.Size([4]) torch.int64

定义模型

import torch.nn.functional as F

# 找到可以用于训练的 GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

# 定义模型
class LeNet(nn.Module):
    # 一般在__init__中定义网络需要的操作算子,比如卷积、全连接算子等等
    def __init__(self):
        super(LeNet, self).__init__()
        # Conv2d的第一个参数是输入的channel数量,第二个是输出的channel数量,第三个是kernel size
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # 由于上一层有16个channel输出,每个feature map大小为53*53,所以全连接层的输入是16*53*53
        self.fc1 = nn.Linear(16*53*53, 120)
        self.fc2 = nn.Linear(120, 84)
        # 最终有10类,所以最后一个全连接层输出数量是10
        self.fc3 = nn.Linear(84, 2)
        self.pool = nn.MaxPool2d(2, 2)
    # forward这个函数定义了前向传播的运算,只需要像写普通的python算数运算那样就可以了
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        # 下面这步把二维特征图变为一维,这样全连接层才能处理
        x = x.view(-1, 16*53*53)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = LeNet().to(device)
print(model)
Using cuda device
LeNet(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=44944, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=2, bias=True)
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

损失函数与优化器

定义一个损失函数和一个优化器。

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

定义训练函数

在单个训练循环中,模型对训练数据集进行预测(分批提供给它),并反向传播预测误差从而调整模型的参数。

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # 计算预测误差
        pred = model(X)
        loss = loss_fn(pred, y)

        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

定义测试函数

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

进行训练

epochs = 20
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_loader, model, loss_fn, optimizer)
    test(test_loader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 0.697082  [    0/  480]
loss: 0.686452  [  400/  480]
Test Error: 
 Accuracy: 50.8%, Avg loss: 0.692428 

Epoch 2
-------------------------------
loss: 0.696046  [    0/  480]
loss: 0.674288  [  400/  480]
Test Error: 
 Accuracy: 50.0%, Avg loss: 0.690799 

Epoch 3
-------------------------------
loss: 0.682432  [    0/  480]
loss: 0.677850  [  400/  480]
Test Error: 
 Accuracy: 58.3%, Avg loss: 0.686088 

Epoch 4
-------------------------------
loss: 0.707287  [    0/  480]
loss: 0.681919  [  400/  480]
Test Error: 
 Accuracy: 60.8%, Avg loss: 0.681735 

Epoch 5
-------------------------------
loss: 0.662526  [    0/  480]
loss: 0.686361  [  400/  480]
Test Error: 
 Accuracy: 59.2%, Avg loss: 0.678261 

Epoch 6
-------------------------------
loss: 0.643308  [    0/  480]
loss: 0.588915  [  400/  480]
Test Error: 
 Accuracy: 63.3%, Avg loss: 0.661859 

Epoch 7
-------------------------------
loss: 0.456625  [    0/  480]
loss: 0.446218  [  400/  480]
Test Error: 
 Accuracy: 64.2%, Avg loss: 0.660168 

Epoch 8
-------------------------------
loss: 0.416538  [    0/  480]
loss: 0.779305  [  400/  480]
Test Error: 
 Accuracy: 61.7%, Avg loss: 0.647555 

Epoch 9
-------------------------------
loss: 0.622066  [    0/  480]
loss: 0.547348  [  400/  480]
Test Error: 
 Accuracy: 66.7%, Avg loss: 0.647476 

Epoch 10
-------------------------------
loss: 0.690601  [    0/  480]
loss: 0.458835  [  400/  480]
Test Error: 
 Accuracy: 65.0%, Avg loss: 0.637805 

Epoch 11
-------------------------------
loss: 0.441014  [    0/  480]
loss: 0.798121  [  400/  480]
Test Error: 
 Accuracy: 68.3%, Avg loss: 0.644360 

Epoch 12
-------------------------------
loss: 0.340511  [    0/  480]
loss: 0.479057  [  400/  480]
Test Error: 
 Accuracy: 67.5%, Avg loss: 0.608323 

Epoch 13
-------------------------------
loss: 0.435809  [    0/  480]
loss: 0.755974  [  400/  480]
Test Error: 
 Accuracy: 65.0%, Avg loss: 0.621828 

Epoch 14
-------------------------------
loss: 0.403148  [    0/  480]
loss: 0.312620  [  400/  480]
Test Error: 
 Accuracy: 66.7%, Avg loss: 0.646973 

Epoch 15
-------------------------------
loss: 0.165473  [    0/  480]
loss: 0.518625  [  400/  480]
Test Error: 
 Accuracy: 70.0%, Avg loss: 0.600993 

Epoch 16
-------------------------------
loss: 0.328379  [    0/  480]
loss: 0.196470  [  400/  480]
Test Error: 
 Accuracy: 72.5%, Avg loss: 0.526722 

Epoch 17
-------------------------------
loss: 1.021464  [    0/  480]
loss: 0.422744  [  400/  480]
Test Error: 
 Accuracy: 75.8%, Avg loss: 0.539513 

Epoch 18
-------------------------------
loss: 0.140470  [    0/  480]
loss: 0.335353  [  400/  480]
Test Error: 
 Accuracy: 71.7%, Avg loss: 0.538070 

Epoch 19
-------------------------------
loss: 0.265230  [    0/  480]
loss: 0.180824  [  400/  480]
Test Error: 
 Accuracy: 75.0%, Avg loss: 0.485590 

Epoch 20
-------------------------------
loss: 0.277113  [    0/  480]
loss: 0.548571  [  400/  480]
Test Error: 
 Accuracy: 73.3%, Avg loss: 0.498096 

Done!
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

K同学啊

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值