15.初识Pytorch反向传播(Backward)与优化器(optimizer)SGD

之前我们已经提及如何使用数据,如何搭建网络,如何使用代价函数,
现在我们讲反向传播与优化器。

We have mentioned how to use the data, how to build the network, 
how to use the loss function, and now we talk about backpropagation and the optimizer. 
  • 为什么要使用反向传播? Why we use backpropagation?
这是因为我们从网络架构出来的结果与我们的Ground Truth进行对比误差,再送回网络进行训练,降低误差,使我们能得到一个较好的网络。

This is because we compare the error between the results from the network architecture and our Ground Truth, and then send it back to the network for training to reduce the error, so that we can get a better network. 

由于之前讲了很多次数据的使用,于是在这章节中,跳过此的讲解,从网络结构与损失函数讲起,再讲反向传播。

Since I have talked about the use of data many times before, in this chapter, I will skip this explanation, start with the network structure and loss function, and then talk about backpropagation. 
 
  • 选择 (choose) LeNet-5网络架构 (Network Architecture)
我们使用的网络架构为前几章提及的[LeNet-5](https://blog.csdn.net/XiaoyYidiaodiao/article/details/122278602);
但是将输入图像的通道由1(灰度图像)改为3(RGB-彩色图像)

The network architecture I use is [LeNet-5] mentioned in the previous chapters (https://blog.csdn.net/XiaoyYidiaodiao/article/details/122278602);
But change the channel of the input image from 1 (gray image) to 3 (RGB image) 

  • 选择 (Choose) CrossEntropyLoss
我们使用上一章提及的[CrossEntropyLoss](https://blog.csdn.net/XiaoyYidiaodiao/article/details/122596286)

I use the [CrossEntropyLoss] mentioned in the previous chapter (https://blog.csdn.net/XiaoyYidiaodiao/article/details/122596286
) 
再在代价函数后,加入一行result.backward()(反向传播)

After the loss function, add a line of result.backward() (backward propagation) 
  • 上代码(code):
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader
from torchvision import transforms

tran_tensor = transforms.ToTensor()
dataset = torchvision.datasets.CIFAR10(root="../dataset", train=False, transform=tran_tensor, download=True)
dataloader = DataLoader(dataset=dataset, batch_size=1, shuffle=True, num_workers=0, drop_last=False)


class LeNet_5(nn.Module):
    def __init__(self):
        super(LeNet_5, self).__init__()
        self.model = nn.Sequential(
            # input:32x32x1
            # 6@28x28
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, padding=0, stride=1),
            # 6@14x14
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            # 16@10x10
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, padding=0, stride=1),
            # 16@5x5
            nn.MaxPool2d(kernel_size=2, stride=2,padding=0),
            # in:16x5x5=400 -> out:120
            nn.Flatten(),
            nn.Linear(400, 120),
            # in:120 -> out:84
            nn.Linear(120, 84),
            # in:84 -> out:10
            nn.Linear(84, 10),
        )

    def forward(self, x):
        x = self.model(x)
        return x


if __name__ == "__main__":
    LeNet_5 = LeNet_5()
    cross_entropy_loss = nn.CrossEntropyLoss()
    for data in dataloader:
        img, target = data
        output = LeNet_5(img)
        result = cross_entropy_loss(output, target)
        print("result: ", result)
        result.backward()
        print("hello")

结果(result):
在这里插入图片描述断点(breakpoint):
在这里插入图片描述

当我们再47行设置断点时,这里的反传的值grad为None.

When I set a breakpoint on line 47, the backward value grad here is None. 

当我们点击运行下一行按钮时,反传值grad便存在了.
When I click the Run Next Row button, the backward value grad exists. 

在这里插入图片描述

  • 优化器 随机梯度下降 SGD
    数学理论支持(mathematical theory)
    在这里插入图片描述在这里插入图片描述在这里插入图片描述
 for example

在这里插入图片描述

为什么要先使用zero_grad()?最后使用optimizer.step()?
Why we use zero_grad() firstly?  and then we use Optimizer.step() last? 

这是因为zero_grad()先将之前的参数值清零,
再使用loss.backward()计算新的参数,最后使用step()设置参数的新值。

This is because zero_grad() first clears the previous parameter value, 
and then we use loss.backward() to calculate the new parameter,
and finally uses step() to set the new value of the parameter. 

上代码(code)

import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader
from torchvision import transforms

tran_tensor = transforms.ToTensor()
dataset = torchvision.datasets.CIFAR10(root="../dataset", train=False, transform=tran_tensor, download=True)
dataloader = DataLoader(dataset=dataset, batch_size=1, shuffle=True, num_workers=0, drop_last=False)


class LeNet_5(nn.Module):
    def __init__(self):
        super(LeNet_5, self).__init__()
        self.model = nn.Sequential(
            # input:32x32x1
            # 6@28x28
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, padding=0, stride=1),
            # 6@14x14
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            # 16@10x10
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, padding=0, stride=1),
            # 16@5x5
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            # in:16x5x5=400 -> out:120
            nn.Flatten(),
            nn.Linear(400, 120),
            # in:120 -> out:84
            nn.Linear(120, 84),
            # in:84 -> out:10
            nn.Linear(84, 10),
        )

    def forward(self, x):
        x = self.model(x)
        return x


if __name__ == "__main__":
    LeNet_5 = LeNet_5()
    cross_entropy_loss = nn.CrossEntropyLoss()
    optim = torch.optim.SGD(LeNet_5.parameters(), lr=0.01)
    for data in dataloader:
        img, target = data
        output = LeNet_5(img)
        result = cross_entropy_loss(output, target)
        # zero out the previous gradient
        # 将之前的参数清零
        optim.zero_grad()
        # Backpropagation of the loss function
        # 损失函数的反向传播
        result.backward()
        # After getting the new value, modify each parameter 
        # 得到新的值后,对每个参数进行修改
        optim.step()

但是以上的运行的结果不能直接的观察到,所以在设置一轮循环,
来观察损失值,看反向传播是否跟新了网络。

However, the results of the above operations cannot be directly observed,
so a cycle is set to observe the loss value to see if the backpropagation has updated the network. 

上代码(code):

import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader
from torchvision import transforms

tran_tensor = transforms.ToTensor()
dataset = torchvision.datasets.CIFAR10(root="../dataset", train=False, transform=tran_tensor, download=True)
dataloader = DataLoader(dataset=dataset, batch_size=64, shuffle=True, num_workers=0, drop_last=False)


class LeNet_5(nn.Module):
    def __init__(self):
        super(LeNet_5, self).__init__()
        self.model = nn.Sequential(
            # input:32x32x1
            # 6@28x28
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, padding=0, stride=1),
            # 6@14x14
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            # 16@10x10
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, padding=0, stride=1),
            # 16@5x5
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            # in:16x5x5=400 -> out:120
            nn.Flatten(),
            nn.Linear(400, 120),
            # in:120 -> out:84
            nn.Linear(120, 84),
            # in:84 -> out:10
            nn.Linear(84, 10),
        )

    def forward(self, x):
        x = self.model(x)
        return x


if __name__ == "__main__":
    LeNet_5 = LeNet_5()
    cross_entropy_loss = nn.CrossEntropyLoss()
    optim = torch.optim.SGD(LeNet_5.parameters(), lr=0.01)
    for epoch in range(20):
        running_loss = 0
        for data in dataloader:
            img, target = data
            output = LeNet_5(img)
            result_loss = cross_entropy_loss(output, target)
            # zero out the previous gradient
            # 将之前的参数清零
            optim.zero_grad()
            # Backpropagation of the loss function
            # 损失函数的反向传播
            result_loss.backward()
            # After getting the new value, modify each parameter
            # 得到新的值后,对每个参数进行修改
            optim.step()
            running_loss = running_loss + result_loss
        print("running_loss: ", running_loss)

结果(result):
在这里插入图片描述

损失值下降,说明我们的优化器有用。

The loss value drops, indicating that our optimizer is useful. 

断点(breakpoint):
在这里插入图片描述

此时运行到代码51行,grad值为None.
At this point, it runs to line 51 of the code, and the grad value is None. 

在这里插入图片描述

此时运行到代码54行,grad值的结果.

At this point, it runs to line 54 of the code, 
the result of the grad value. 

在这里插入图片描述

此时运行到代码57行,grad值的结果.

At this point, it runs to line 57 of the code, 
the result of the grad value. 

在这里插入图片描述

接着一个新的循环,运行完51行代码之后,grad的结果

Then a new loop, after running 51 lines of code, the result of grad 

在这里插入图片描述

运行完54行代码之后,grad的结果发生改变

after running 54 lines of code, the result of grad changes.

在这里插入图片描述

这样反复,发新参数跟新。

This is repeated, and new parameters are updated. 

上一章 14.初识Pytorch损失函数(Loss_funcations)
下一章 16.初识Pytorch现有网络的使用与修改

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值