Pytorch 模型构建、训练、测试及预测

最新推荐文章于 2024-06-23 18:54:35 发布

andyL_05

最新推荐文章于 2024-06-23 18:54:35 发布

阅读量3.7w

点赞数 41

分类专栏： Pytorch 深度学习计算机视觉

本文链接：https://blog.csdn.net/andyL_05/article/details/103363603

版权

深度学习同时被 3 个专栏收录

21 篇文章 2 订阅

订阅专栏

计算机视觉

15 篇文章 0 订阅

订阅专栏

Pytorch

12 篇文章 4 订阅

订阅专栏

Pytorch 模型构建、训练、测试及预测

本文以AlexNet识别手写数字为例，简要介绍如何使用pytorch构建网络模型，并进行训练、测试及预测
所使用的环境：Ubuntu 19.04，python 3.7，pytorch 1.1.0，torchvision 0.3.0

1· Pytorch模型构建

pytorch自定义网络模型较为简单，自定义class继承自(torch.nn.Module)并定义__init__及forward方法即可。init函数中定义网络结构，各层参数，传入指定的参数等，要注意涉及到全连接网络的应当处理好输入的大小、卷积池化操作参数（kernel size、stride、padding等）以及全连接第一层输入参数数量，避免出错。
在本文中，由于目标是手写数字识别，使用MNIST数据集，输入图像大小是28x28x1（单通道），而原始的AlexNet用于ImageNet图像分类，其输入是227x227x3，因此在输入通道、各层channel数量、各层参数等方面均做了一定的修改。
AlexNext本身包含5个卷积层与3个全连接层组成，其结构图如下：
AlexNet结构图，来自百度百科
本文自定的AlexNet输入单通道28x28图像，在第1,2,5层卷积后增加maxpooling，1,2层卷积后增加局部响应归一化LRN层，5层卷积后大小为4x4x256，通过view()方法转化为1*4096，后续通过3层全连接层转为大小为10的向量，分别对应了0-9这10个数字的相应强度，训练中与onehot标签计算交叉熵损失，测试及预测选择响应最大的作为输出值。
模型定义如下所示：

# a modified model of AlexNet, include 5 convlotional layers and 3 full-connect layers
# I modified some parameters of AlexNet
# size of input images is 28 x 28, in grayscale (1 channel)
# output is a vector with a length of 10 (0-9)
class myAlexNet(nn.Module):
  def __init__(self, imgChannel):
    super(myAlexNet, self).__init__()
    # conv1
    self.conv1 = nn.Sequential(
                                nn.Conv2d(in_channels=imgChannel, out_channels=32, kernel_size=3, padding=1),
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2),
                                nn.LocalResponseNorm(size = 5)
                                )
    # conv2
    self.conv2 = nn.Sequential(
                                nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1,padding=2),
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2),
                                nn.LocalResponseNorm(size = 5)
                                )
    # conv3
    self.conv3 = nn.Sequential(
                                nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride =1, padding=1),
                                nn.ReLU()
                                )
    # conv4
    self.conv4 = nn.Sequential(
                                nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride =1, padding=1),
                                nn.ReLU()
                                )
    # conv5
    self.conv5 = nn.Sequential(
                                nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride =1, padding=1),
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2)
                                )
    self.fc1 = nn.Linear(256 * 4 * 4, 1024)
    self.fc2 = nn.Linear(1024, 512)
    self.fc3 = nn.Linear(512, 10)
  
  def forward(self, input):
    #print(input.size())
    out = self.conv1(input)
    #print(out.size())
    out = self.conv2(out)
    #print(out.size())
    out = self.conv3(out)
    #print(out.size())
    out = self.conv4(out)
    #print(out.size())
    out = self.conv5(out)
    #print(out.size())
    out = out.view(-1, 256 * 4 * 4)
    #print(out.size())
    out = self.fc1(out)
    #print(out.size())
    out = self.fc2(out)
    #print(out.size())
    out = self.fc3(out)
    #print(out.size())
    return out

通过nn.Sequential将每一层的卷积、激活函数等连接在一起，逐层定义各层神经网络。
实际使用中，也可以每一部分分别定义；或全部加入list中（如定义list名为layers，将各层加入其中），最后使用self.net = nn.Sequential(layers)的方式构建。如果模型需要特定参数，也可在init方法中传入并记录下来。forward（）函数定义网络的前向传播方法，一般需要传入self以及输入参数input，input在CV领域通常为四维张量，即 BATCHCHANNELSWIDTHHEIGHT形式，每一批的若干图像数据。前向传播方法一般就是输入依次通过网络的各层并返回最后的输出，注意涉及到全连接层时需要通过view（）方法进行reshape，view方法参数指定各个维度的size，-1表示根据其他维度推断。
训练中输出各层大小可以得到：
本文定义的AlexNet输出各层的size
64为batch_size，后面分别是通道数、长、宽（注意通道数在前）

2· 模型的训练

pytorch模型训练主要包括训练集加载、前向传播、损失计算、反向传播、参数更新等
代码如下：

def train(epochs, trainLoader, model, device,Lr,momen):
  criterion = nn.CrossEntropyLoss()
  optimizer = torch.optim.SGD(model.parameters(), lr=Lr, momentum=momen)
  model.to(device)
  for e in range(epochs):
    for i, (imgs, labels) in enumerate(trainLoader):
      imgs = imgs.to(device)
      labels = labels.to(device)
      out = model(imgs)
      loss = criterion(out, labels)
      optimizer.zero_grad() # if don't call zero_grad, the grad of each batch will be accumulated
      loss.backward()
      optimizer.step()
      if i%20==0:
        print('epoch: {}, batch: {}, loss: {}'.format(e + 1, i + 1, loss.data))
  torch.save(model, 'myAlexMnistDemo.pth') # save net model and parameters

这个函数传入了model以及trainLoader等参数，trainLoader即训练集加载器，dataloader的使用可参考我的另一篇文章 https://blog.csdn.net/andyL_05/article/details/103297450
训练阶段，首先定义损失函数，本文使用交叉熵损失；接下来定义优化器，本文使用SGD随机梯度下降，注意优化器的第一个参数就是待优化的参数，一般为网络的parameters()，不同优化器所需参数不同（学习率等）
训练时，一般有两个loop，外层是每一个epoch，内层是每一个batch，每个batch的循环可使用enumerate函数，每次返回dataloader中的一批数据以及当前批次编号。对每一批数据，传入网络得到输出，计算损失，损失反向传播。要注意反向传播前先对优化器使用zero_grad()方法，如不使用则每一批的梯度会累积起来，要先清除上一批的梯度。反向传播后调用step()方法更新参数。
另外，要注意输入数据与模型要在同一个device上，不然数据类型将会不同（Tensor vs Cuda Tensor）将会报错。

3· 模型的测试及预测

这部分的代码参考文末的完整代码，不在此给出。要注意使用 with torch.no_grad()，将输入数据传入网络，得到输出，并取输出响应最大的类别作为分类结果。对于测试，需要对测试数据计算分类识别的准确率，统计数据量和正确的数据量；对于预测，需要注意对单张图片增加一个batch维度，以匹配网络的输入。

以下是完整代码：

import os
import numpy as np
import math
import argparse
import torchvision.transforms as transforms

from torch.utils.data import DataLoader
from torchvision import datasets
from torch.autograd import Variable

import torch.nn as nn
import torch.nn.functional as F
import torch

import cv2
from PIL import Image

# a modified model of AlexNet, include 5 convlotional layers and 3 full-connect layers
# I modified some parameters of AlexNet
# size of input images is 28 x 28, in grayscale (1 channel)
# output is a vector with a length of 10 (0-9)
class myAlexNet(nn.Module):
  def __init__(self, imgChannel):
    super(myAlexNet, self).__init__()
    # conv1
    self.conv1 = nn.Sequential(
                                nn.Conv2d(in_channels=imgChannel, out_channels=32, kernel_size=3, padding=1),
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2),
                                nn.LocalResponseNorm(size = 5)
                                )
    # conv2
    self.conv2 = nn.Sequential(
                                nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1,padding=2),
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2),
                                nn.LocalResponseNorm(size = 5)
                                )
    # conv3
    self.conv3 = nn.Sequential(
                                nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride =1, padding=1),
                                nn.ReLU()
                                )
    # conv4
    self.conv4 = nn.Sequential(
                                nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride =1, padding=1),
                                nn.ReLU()
                                )
    # conv5
    self.conv5 = nn.Sequential(
                                nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride =1, padding=1),
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2)
                                )
    self.fc1 = nn.Linear(256 * 4 * 4, 1024)
    self.fc2 = nn.Linear(1024, 512)
    self.fc3 = nn.Linear(512, 10)
  
  def forward(self, input):
    #print(input.size())
    out = self.conv1(input)
    #print(out.size())
    out = self.conv2(out)
    #print(out.size())
    out = self.conv3(out)
    #print(out.size())
    out = self.conv4(out)
    #print(out.size())
    out = self.conv5(out)
    #print(out.size())
    out = out.view(-1, 256 * 4 * 4)
    #print(out.size())
    out = self.fc1(out)
    #print(out.size())
    out = self.fc2(out)
    #print(out.size())
    out = self.fc3(out)
    #print(out.size())
    return out

# train function
def train(epochs, trainLoader, model, device,Lr,momen):
  criterion = nn.CrossEntropyLoss()
  optimizer = torch.optim.SGD(model.parameters(), lr=Lr, momentum=momen)
  model.to(device)
  for e in range(epochs):
    for i, (imgs, labels) in enumerate(trainLoader):
      imgs = imgs.to(device)
      labels = labels.to(device)
      out = model(imgs)
      loss = criterion(out, labels)
      optimizer.zero_grad() # if don't call zero_grad, the grad of each batch will be accumulated
      loss.backward()
      optimizer.step()
      if i%20==0:
        print('epoch: {}, batch: {}, loss: {}'.format(e + 1, i + 1, loss.data))
  torch.save(model, 'myAlexMnistDemo.pth') # save net model and parameters

# test function
def test(testLoader, model, device):
  model.to(device)
  with torch.no_grad(): # when in test stage, no grad
    correct = 0
    total = 0
    for (imgs, labels) in testLoader:
      imgs = imgs.to(device)
      labels = labels.to(device)
      out = model(imgs)
      _, pre = torch.max(out.data, 1)
      total += labels.size(0)
      correct += (pre == labels).sum().item()
    print('Accuracy: {}'.format(correct / total))

# predict function
def predict(input, model, device):
  model.to(device)
  with torch.no_grad():
    input=input.to(device)
    out = model(input)
    _, pre = torch.max(out.data, 1)
    return pre.item()


def main():
  parser = argparse.ArgumentParser()
  parser.add_argument("--stage", type=str, default='train', help="is train or test")
  parser.add_argument("--epochs", type=int, default=30, help="number of epochs of training")
  parser.add_argument("--batch_size", type=int, default=128, help="size of the batches")
  parser.add_argument("--lr", type=float, default=0.001, help="SGD: learning rate")
  parser.add_argument("--momentum", type=float, default=0.9, help="SGD: momentum")
  parser.add_argument("--img_size", type=tuple, default=(28,28), help="size of each image dimension")
  parser.add_argument("--channels", type=int, default=1, help="number of image channels")
  parser.add_argument("--predictImg", type=str, default='', help="image need to be predicted")
  opt = parser.parse_args()
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  if opt.stage=='train': # in train stage
    dataloader = torch.utils.data.DataLoader(
      datasets.MNIST(
        "/home/liming/Project/MNIST",
        train=True,
        download=False,
        transform=transforms.Compose(
            [transforms.Resize(opt.img_size), transforms.ToTensor(), transforms.Normalize([0.5], [0.5])]
        ),
      ),
      batch_size=opt.batch_size,
      shuffle=True,
      num_workers=8,
    )
    model = myAlexNet(opt.channels)
    train(opt.epochs, dataloader, model, device, opt.lr, opt.momentum)

  elif opt.stage == 'test':
    testLoader = dataloader = torch.utils.data.DataLoader(
      datasets.MNIST(
        "your dataset path",
        train=False,
        download=False,
        transform=transforms.Compose(
            [transforms.Resize(opt.img_size), transforms.ToTensor(), transforms.Normalize([0.5], [0.5])]
        ),
      ),
      batch_size=opt.batch_size,
      shuffle=True,
      num_workers=8,
    )
    model = torch.load('myAlexMnistDemo.pth')
    test(testLoader, model, device)
    
  elif opt.stage == 'predict':
    model = torch.load('myAlexMnistDemo.pth')
    transform=transforms.Compose(
            [transforms.Grayscale(),
            transforms.Resize(opt.img_size),
            #transforms.Normalize([0.5], [0.5]),
            transforms.ToTensor(),]
        )
    img = Image.open(opt.predictImg).convert('RGB')
    print(type(img))
    img = transform(img)
    img = img.unsqueeze(0)
    ans = predict(img, model, device)
    print('prediction of this image is a hand writing number: {}'.format(ans))



if __name__ == '__main__':
  main()