Pytorch学习 day11（优化器、使用和修改现有网络模型）

丿罗小黑

已于 2024-03-26 20:51:11 修改

阅读量653

点赞数 18

分类专栏： Pytorch 文章标签： pytorch 学习人工智能

于 2024-03-11 20:27:35 首次发布

本文链接：https://blog.csdn.net/u011453680/article/details/136633810

版权

Pytorch 专栏收录该内容

17 篇文章 1 订阅

订阅专栏

优化器

在模型的训练过程中，我们根据反向传播来获取每个网络节点的对应梯度，之后通过某种算法（优化器）根据对应梯度来不断更新节点的参数，最终达到最小loss的结果。
优化器有很多，但有两个参数是所有优化器都有的：
- 模型的节点参数、学习率，其中：学习率是我们手动指定的，一般都为较小的数，如下：
以下为使用优化器的步骤：
- 注意，在每次反向传播（backward）之前，需要进行梯度清零（zero_grad）。
  - 因为在反向传播过程中，计算出的梯度会累积在模型的参数中。如果不进行梯度清零，那么在下一次反向传播时，当前的梯度会与上一次的梯度累加，导致更新参数时的梯度方向不准确，从而影响模型的收敛性和性能。所以为了确保每次的反向传播都是计算的当前参数下的梯度，需要进行梯度清零。
- 下图中的data为网络节点参数，grad为梯度：
以下为采用SGD优化器，并结合上一节的交叉熵损失函数，对CIFAR10数据集进行分类的模型代码：
- 这里我们将每轮中每个预测的loss加和，作为该轮模型的整体loss进行输出

import torch
import torchvision
from torch import nn


test_dataset = torchvision.datasets.CIFAR10(root='Dataset', train=False, download=True, transform=torchvision.transforms.ToTensor())
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False)

class Tudui(nn.Module):
    def __init__(self):
        super().__init__()
        self.module1 = nn.Sequential(
            nn.Conv2d(3, 32, 5, 1, 2),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 32, 5, 1, 2),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, 5, 1, 2),
            nn.MaxPool2d(2, 2),
            nn.Flatten(),
            nn.Linear(1024, 64),
            nn.Linear(64, 10)
        )
    def forward(self, input):
        output = self.module1(input)
        return output

loss = nn.CrossEntropyLoss()    # 交叉熵损失函数
tudui = Tudui()
optim = torch.optim.SGD(tudui.parameters(), lr=0.01)    # SGD优化器 随机梯度下降

for epoch in range(20):
    runnning_loss = 0.0
    for data in test_loader:
        inputs, targets = data
        outputs = tudui(inputs)
        result_loss = loss(outputs, targets)    # 计算loss
        optim.zero_grad()    # 梯度清零，注意需要在反向传播之前
        result_loss.backward()  # 反向传播，计算梯度，注意需要使用计算后的loss
        optim.step()    # 更新参数
        runnning_loss += result_loss
    print(f'epoch: {epoch}, loss: {runnning_loss}')

使用和修改现有网络模型

在我们不使用预训练的网络模型时，模型各节点的参数都是默认设置。如果我们更换预训练的模型，则节点参数是已经在某个数据集上训练好的，可以达到一个比较好的输出结果
VGG16模型的用法：
- 注意：当weights = None时，表示不使用预训练的网络模型，节点参数使用默认设置
- 当weights = 'DEFAULT’时，表示使用预训练的网络模型
以下是VGG16在ImageNet数据集上，使用预训练的节点参数：
以下为VGG16的默认节点参数：
可以打印VGG16的模型结构，代码如下：
- 由最后的输出结果可以看出VGG16的预测类别为1000类，符合ImageNet数据集的类别

import torch
import torchvision
import torch.nn as nn

# train_dataset = torchvision.datasets.ImageNet(root = 'Dataset', split='train', download=True, transform=torchvision.transforms.ToTensor())
# 由于数据集太大，现在已经不提供下载，报错如下：
# RuntimeError: The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in Dataset.
vgg16_false = torchvision.models.vgg16(weights=None)
vgg16_true = torchvision.models.vgg16(weights="DEFAULT")

print(vgg16_false)

# 输出结果：
# VGG(
#   (features): Sequential(
#     (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (1): ReLU(inplace=True)
#     (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (3): ReLU(inplace=True)
#     (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (6): ReLU(inplace=True)
#     (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (8): ReLU(inplace=True)
#     (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (11): ReLU(inplace=True)
#     (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (13): ReLU(inplace=True)
#     (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (15): ReLU(inplace=True)
#     (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#      (18): ReLU(inplace=True)
#     (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (20): ReLU(inplace=True)
#     (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (22): ReLU(inplace=True)
#     (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (25): ReLU(inplace=True)
#     (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (27): ReLU(inplace=True)
#     (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (29): ReLU(inplace=True)
#     (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#   )
#   (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
#   (classifier): Sequential(
#     (0): Linear(in_features=25088, out_features=4096, bias=True)
#     (1): ReLU(inplace=True)
#     (2): Dropout(p=0.5, inplace=False)
#     (3): Linear(in_features=4096, out_features=4096, bias=True)
#     (4): ReLU(inplace=True)
#     (5): Dropout(p=0.5, inplace=False)
#     (6): Linear(in_features=4096, out_features=1000, bias=True)
#   )
# )

在这里插入图片描述

由于CIFAR10数据集只有10各类别，所以我们可以修改VGG16的最后一层的输出，将1000改为10，或者再添加一层将1000转为10。
- 可以使用 .add_module(name, module)函数。
  - name：表示添加层的名字
  - module：表示添加的层，如nn.Linear()、nn.Conv2d()、nn.Sequential()均可
添加一层的代码如下：
- 如果我们直接用模型名.add_module()，那么我们添加的层是在模型的最末端
- 我们也可以使用模型名.sequential名.add_module()的形式，那么我们添加的层就是在该sequential的最末端

import torch
import torchvision
import torch.nn as nn

# train_dataset = torchvision.datasets.ImageNet(root = 'Dataset', split='train', download=True, transform=torchvision.transforms.ToTensor())
# 由于数据集太大，现在已经不提供下载，报错如下：
# RuntimeError: The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in Dataset.
vgg16_false = torchvision.models.vgg16(weights=None)
vgg16_true = torchvision.models.vgg16(weights="DEFAULT")

# print(vgg16_false)
# print(vgg16_true)

test_dataset = torchvision.datasets.CIFAR10(root = 'Dataset', train=False, download=True, transform=torchvision.transforms.ToTensor())
vgg16_true.add_module('add_linear', nn.Linear(1000, 10))
print(vgg16_true)

# 输出结果：
# VGG(
#   (features): Sequential(
#     (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (1): ReLU(inplace=True)
#     (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (3): ReLU(inplace=True)
#     (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (6): ReLU(inplace=True)
#     (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (8): ReLU(inplace=True)
#     (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (11): ReLU(inplace=True)
#     (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (13): ReLU(inplace=True)
#     (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (15): ReLU(inplace=True)
#     (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#      (18): ReLU(inplace=True)
#     (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (20): ReLU(inplace=True)
#     (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (22): ReLU(inplace=True)
#     (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (25): ReLU(inplace=True)
#     (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (27): ReLU(inplace=True)
#     (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (29): ReLU(inplace=True)
#     (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#   )
#   (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
#   (classifier): Sequential(
#     (0): Linear(in_features=25088, out_features=4096, bias=True)
#     (1): ReLU(inplace=True)
#     (2): Dropout(p=0.5, inplace=False)
#     (3): Linear(in_features=4096, out_features=4096, bias=True)
#     (4): ReLU(inplace=True)
#     (5): Dropout(p=0.5, inplace=False)
#     (6): Linear(in_features=4096, out_features=1000, bias=True)
#   )
#   (add_linear): Linear(in_features=1000, out_features=10, bias=True)
# )

修改一层的代码如下：
- 可以通过模型实例.层名的方法来修改某一层，如果为该层为sequential，那么层名后加 [ ] 可以指定sequential中的第几层

import torch
import torchvision
import torch.nn as nn

# train_dataset = torchvision.datasets.ImageNet(root = 'Dataset', split='train', download=True, transform=torchvision.transforms.ToTensor())
# 由于数据集太大，现在已经不提供下载，报错如下：
# RuntimeError: The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in Dataset.
vgg16_false = torchvision.models.vgg16(weights=None)
vgg16_true = torchvision.models.vgg16(weights="DEFAULT")

# print(vgg16_false)
# print(vgg16_true)

test_dataset = torchvision.datasets.CIFAR10(root = 'Dataset', train=False, download=True, transform=torchvision.transforms.ToTensor())
vgg16_true.classifier[6]=nn.Linear(1000, 10)
print(vgg16_true)

# 输出结果：
# VGG(
#   (features): Sequential(
#     (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (1): ReLU(inplace=True)
#     (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (3): ReLU(inplace=True)
#     (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (6): ReLU(inplace=True)
#     (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (8): ReLU(inplace=True)
#     (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (11): ReLU(inplace=True)
#     (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (13): ReLU(inplace=True)
#     (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (15): ReLU(inplace=True)
#     (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#      (18): ReLU(inplace=True)
#     (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (20): ReLU(inplace=True)
#     (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (22): ReLU(inplace=True)
#     (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (25): ReLU(inplace=True)
#     (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (27): ReLU(inplace=True)
#     (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (29): ReLU(inplace=True)
#     (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#   )
#   (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
#   (classifier): Sequential(
#     (0): Linear(in_features=25088, out_features=4096, bias=True)
#     (1): ReLU(inplace=True)
#     (2): Dropout(p=0.5, inplace=False)
#     (3): Linear(in_features=4096, out_features=4096, bias=True)
#     (4): ReLU(inplace=True)
#     (5): Dropout(p=0.5, inplace=False)
#     (6): Linear(in_features=4096, out_features=10, bias=True)
#   )
# )

丿罗小黑

关注

18
点赞
踩
7

收藏

觉得还不错? 一键收藏
打赏
0
评论
Pytorch学习 day11（优化器、使用和修改现有网络模型）

优化器在模型的训练过程中，我们根据反向传播来获取每个网络节点的对应梯度，之后通过某种算法（优化器）根据对应梯度来不断更新节点的参数，最终达到最小loss的结果。优化器有很多，但有两个参数是所有优化器都有的：模型的节点参数、学习率，其中：学习率是我们手动指定的，一般都为较小的数，如下：以下为使用优化器的步骤：注意，在每次反向传播（backward）之前，需要进行梯度清零（zero_grad）。因为在反向传播过程中，计算出的梯度会累积在模型的参数中。如果不进行梯度清零，那么在下一次反向
复制链接

扫一扫