Pytorch 复习总结 5

ScienceLi1125

已于 2024-03-06 18:35:06 修改

阅读量659

点赞数 10

分类专栏： python 文章标签： pytorch python

于 2024-02-29 23:53:46 首次发布

本文链接：https://blog.csdn.net/m0_51976564/article/details/136097411

版权

python 专栏收录该内容

6 篇文章 2 订阅

订阅专栏

Pytorch 复习总结，仅供笔者使用，参考教材：

本文主要内容为：Pytorch 卷积神经网络。

本文先介绍了卷积神经网络的基本结构，然后介绍了三种经典的卷积神经网络 —— LeNet、VGG 和 ResNet，并将其应用于 Fashion-MNIST 数据集的分类任务。

Pytorch 语法汇总：

Pytorch 张量的常见运算、线性代数、高等数学、概率论部分见 Pytorch 复习总结1；
Pytorch 线性神经网络部分见 Pytorch 复习总结2；
Pytorch 多层感知机部分见 Pytorch 复习总结3；
Pytorch 深度学习计算部分见 Pytorch 复习总结4；
Pytorch 卷积神经网络部分见 Pytorch 复习总结5；
Pytorch 计算机视觉部分见 Pytorch 复习总结6；

一. 卷积神经网络

卷积神经网络 (Convolutional Neural Networks, CNNs) 是一种用于图像处理任务的深度学习模型。通过引入 卷积层 (Convolutional Layer) 和 池化层 (Pooling Layer)，前者用于提取输入数据的特征表示，后者用于对特征图进行下采样。池化层一般接在卷积层之后，用于降低特征图的空间尺寸，减少计算量和参数数量，进一步提取输入数据的主要特征。这有助于提高网络的计算效率，并且在一定程度上提高了模型的鲁棒性。

卷积是卷积神经网络的核心操作，通过使用滤波器（即卷积核）在输入数据上进行滑动卷积操作来提取图像中的边缘、纹理等特征：
在这里插入图片描述

图像经过卷积层得到的计算结果称为 特征图 (Feature Map)，输出特征图上某一个像素点所能感受到的输入图像的数据范围称为 感受野 (Receptive Field)：
在这里插入图片描述

卷积神经网络的主要特性如下：

平移不变性 (translation invariance)：不管检测对象出现在图像中的哪个位置，神经网络的前面几层应该对相同的图像区域具有相似的反应；
局部性 (locality)：神经网络的前面几层应该只探索输入图像中的局部区域，而不过度在意图像中相隔较远区域的关系；

1. 卷积层

图像处理中一般使用的都是二维卷积层，可以通过 torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros') 定义卷积层。可选参数如下：

in_channels：输入图像的通道数。对于 RGB 图像，通常有 3 个通道；对于灰度图像，通常只有 1 个通道；
out_channels：输出特征图的通道数，也是卷积核的数量。因为每个卷积核产生一个输出通道；
kernel_size：卷积核的大小。如果是一个整数，则表示卷积核的高度和宽度相同；如果是一个元组，可以分别指定高度和宽度；
stride：卷积核的步长。如果是一个整数，则表示卷积核横向和纵向步长相同；如果是一个元组，则可以分别指定横向和纵向的步长；
padding：输入图像的边缘零填充数量，可以是整数或元组；
dilation：卷积核中相邻元素的间隔，可以是整数或元组；
groups：控制输入和输出之间连接的组数；
bias：是否添加偏置项；
padding_mode：填充模式，可以是 ‘zeros’ 或 ‘circular’；

一般情况下，卷积神经网络是多输入多输出的，有多个不同的卷积核，输入通道数也并一定等于输出通道数 ¹。因为每个卷积核都会有与输入通道数相同的通道数，每个卷积核都对应着一个输出通道。因此一个卷积核将几个输入通道卷积后相加得到一个输出通道，有几个卷积核就有几个输出通道：
在这里插入图片描述

2. 池化层

池化层一般分为最大池化 (maximum pooling) 和平均汇聚 (average pooling)，用于对输入特征图进行下采样，以缩小特征图的尺寸。可以通过 torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1) 或 torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, dilation=1, ceil_mode=False) 定义池化层。可选参数如下：

kernel_size：池化核的大小。如果是一个整数，则表示池化核的高度和宽度相同；如果是一个元组，可以分别指定高度和宽度；
stride：池化操作的步长。可以是整数或元组，默认为kernel_size；
padding：输入数据的边缘零填充数量；
dilation：池化操作中的空洞卷积的扩张率；
ceil_mode：平均池化时选用的计算函数。如果设置为True，则使用 ceil 函数来计算；默认为 False，使用 floor 函数；

3. 批量归一化层

批量归一化层一般用于卷积层之后或者激活函数之前，用于批量归一化 (Batch Normalization)，以达到加速神经网络的训练过程、防止梯度消失或梯度爆炸的目的。批量归一化层通过对每个特征图的每个通道进行归一化，使得每个通道的均值接近于 0，标准差接近于 1。这样的归一化有助于确保网络的每一层输入分布相对稳定，从而更加稳定地训练模型。

可以通过 class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 定义批量归一化层，可选参数如下：

num_features：输入特征的数量，通常为卷积层的输出通道数；
eps：用于防止分母为零的小值，即分母中添加的常数；
momentum：用于计算运行时均值和方差的动量，默认值为 0.1；
affine：是否学习仿射参数，即缩放和平移，默认为 True；
track_running_stats：是否在训练时更新运行时统计信息（均值和方差），默认为 True；

二. LeNet

LeNet 是由 Yann LeCun 等人于 1998 年提出的一个早期的卷积神经网络架构，是深度学习领域的重要里程碑之一。LeNet 是最早用于手写数字识别任务的卷积神经网络之一，被广泛认为是深度学习的开端。
在这里插入图片描述

下面以 LeNet 为例，在 Fashion-MNIST 数据集上测试卷积神经网络的效果。训练过程与 Pytorch 复习总结 3 中多层感知机的训练过程相似，不过需要将变量移动到 GPU，将训练过程封装成函数：

import torch
from torch import nn
from torch.utils import data
import torchvision
from torchvision import transforms

"""下载Fashion-MNIST数据集并将其加载到内存中"""
def load_data_fashion_mnist(batch_size, resize=None):
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="./data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="./data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True),
            data.DataLoader(mnist_test, batch_size, shuffle=False))
batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)


'''创建卷积神经网络'''
net = nn.Sequential(
    nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120), nn.Sigmoid(),
    nn.Linear(120, 84), nn.Sigmoid(),
    nn.Linear(84, 10)
)
def init_weights(m):
    if type(m) == nn.Linear or type(m) == nn.Conv2d:
        nn.init.xavier_uniform_(m.weight)
net.apply(init_weights)

optimizer = torch.optim.SGD(net.parameters(), lr=0.9)
loss = nn.CrossEntropyLoss()


'''训练'''
def accuracy(y_hat, y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())
def evaluate_accuracy_gpu(net, data_iter, device):
    if isinstance(net, nn.Module):
        net.eval()          			# 设置为评估模式
    test_acc_sum = 0.0      			# 正确预测的数量
    test_sample_num = 0     			# 总预测的数量
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(X, list):
                X = [x.to(device) for x in X]
            else:
                X = X.to(device)
            y = y.to(device)
            test_acc_sum += accuracy(net(X), y)
            test_sample_num += y.numel()
    return test_acc_sum / test_sample_num
def train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device):
    net.to(device)
    for epoch in range(num_epochs):
        train_loss_sum = 0.0            # 训练损失总和
        train_acc_sum = 0.0             # 训练准确度总和
        sample_num = 0                  # 样本数
        net.train()
        for i, (X, y) in enumerate(train_iter):
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.sum().backward()
            optimizer.step()
            with torch.no_grad():
                train_loss_sum += l.sum()
                train_acc_sum += accuracy(y_hat, y)
                sample_num += y.numel()
            train_loss = train_loss_sum / sample_num
            train_acc = train_acc_sum / sample_num
        test_acc = evaluate_accuracy_gpu(net, test_iter, device)
        print(f'loss {train_loss:.3f}, train acc {train_acc:.3f}, test acc {test_acc:.3f}')

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('---------------- training on', device, '----------------')
num_epochs = 10
train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device)
'''
---------------- training on cuda:0 ----------------
loss 0.009, train acc 0.101, test acc 0.100
loss 0.007, train acc 0.345, test acc 0.607
loss 0.004, train acc 0.640, test acc 0.588
loss 0.003, train acc 0.718, test acc 0.698
loss 0.003, train acc 0.750, test acc 0.745
loss 0.002, train acc 0.774, test acc 0.772
loss 0.002, train acc 0.790, test acc 0.794
loss 0.002, train acc 0.805, test acc 0.786
loss 0.002, train acc 0.816, test acc 0.801
loss 0.002, train acc 0.828, test acc 0.822
'''

三. VGG

VGG 网络是由牛津大学计算机视觉组 (Visual Geometry Group,VGG) 开发的一种深度卷积神经网络结构。VGG 网络首次引入了块的概念，将从单个神经元发展到整个层的神经网络转向块的设计理念。

VGG 网络采用了相对较小的 3x3 卷积核和较深的网络结构，核心思想是通过堆叠多个小尺寸的卷积层和池化层来构建深度网络，以逐渐提取和组合图像的特征：
在这里插入图片描述

1. VGG 块

def vgg_block(num_convs, in_channels, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.ReLU())
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
    return nn.Sequential(*layers)

2. VGG 网络

VGG 网络有 5 个卷积块，其中前两个块各有一个卷积层，后三个块各包含两个卷积层。第一个模块有 64 个输出通道，每个后续模块将输出通道数量翻倍，直到该数字达到 512。由于该网络使用 8 个卷积层和 3 个全连接层，因此也被称为 VGG-11：

conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
def vgg(conv_arch):
    conv_blks = []
    in_channels = 1
    # 卷积层部分
    for (num_convs, out_channels) in conv_arch:
        conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels

    return nn.Sequential(
        *conv_blks, nn.Flatten(),
        # 全连接层部分
        nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 10))

3. 训练

训练过程同 LeNet，使用封装好的 train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device) 函数，解决 Fashion-MNIST 的分类问题：

import torch
from torch import nn
from torch.utils import data
import torchvision
from torchvision import transforms
from torch.nn import functional as F

"""下载Fashion-MNIST数据集并将其加载到内存中"""
def load_data_fashion_mnist(batch_size, resize=None):
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root="./data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(root="./data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True), data.DataLoader(mnist_test, batch_size, shuffle=False))
batch_size = 128
train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=224)


'''定义VGG网络'''
def vgg_block(num_convs, in_channels, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.ReLU())
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
    return nn.Sequential(*layers)
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
def vgg(conv_arch):
    conv_blks = []
    in_channels = 1
    # 卷积层部分
    for (num_convs, out_channels) in conv_arch:
        conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels

    return nn.Sequential(
        *conv_blks, nn.Flatten(),
        # 全连接层部分
        nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 10))

net = vgg(conv_arch)

optimizer = torch.optim.SGD(net.parameters(), lr=0.05)
loss = nn.CrossEntropyLoss()


'''训练'''
def accuracy(y_hat, y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())
def evaluate_accuracy_gpu(net, data_iter, device):
    if isinstance(net, nn.Module):
        net.eval()          			# 设置为评估模式
    test_acc_sum = 0.0      			# 正确预测的数量
    test_sample_num = 0     			# 总预测的数量
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(X, list):
                X = [x.to(device) for x in X]
            else:
                X = X.to(device)
            y = y.to(device)
            test_acc_sum += accuracy(net(X), y)
            test_sample_num += y.numel()
    return test_acc_sum / test_sample_num
def train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device):
    net.to(device)
    for epoch in range(num_epochs):
        train_loss_sum = 0.0            # 训练损失总和
        train_acc_sum = 0.0             # 训练准确度总和
        sample_num = 0                  # 样本数
        net.train()
        for i, (X, y) in enumerate(train_iter):
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.sum().backward()
            optimizer.step()
            with torch.no_grad():
                train_loss_sum += l.sum()
                train_acc_sum += accuracy(y_hat, y)
                sample_num += y.numel()
            train_loss = train_loss_sum / sample_num
            train_acc = train_acc_sum / sample_num
        test_acc = evaluate_accuracy_gpu(net, test_iter, device)
        print(f'loss {train_loss:.3f}, train acc {train_acc:.3f}, test acc {test_acc:.3f}')

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('---------------- training on', device, '----------------')
num_epochs = 10
train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device)
'''
---------------- training on cuda:0 ----------------
loss 0.018, train acc 0.098, test acc 0.100
loss 0.018, train acc 0.106, test acc 0.100
loss 0.018, train acc 0.134, test acc 0.100
loss 0.018, train acc 0.100, test acc 0.100
loss 0.016, train acc 0.199, test acc 0.595
loss 0.006, train acc 0.699, test acc 0.776
loss 0.004, train acc 0.832, test acc 0.842
loss 0.003, train acc 0.870, test acc 0.877
loss 0.002, train acc 0.887, test acc 0.889
loss 0.002, train acc 0.899, test acc 0.895
'''

四. ResNet

残差网络 (Residual Networks, ResNet) 由何凯明等人于 2015 年提出是一种深度学习卷积神经网络架构。ResNet 的核心思想是通过引入残差连接 (residual connections) 来解决深度神经网络训练过程中的梯度消失和梯度爆炸问题，从而使得训练非常深的网络成为可能。

在传统的深度神经网络中，随着网络层数的增加，网络的训练会遇到退化问题，即网络变深后准确率反而会下降。于是，ResNet 通过引入残差块 (residual block) 来解决这个问题。在残差块中，网络学习的是相对于一个恒等映射 (identity mapping) 的残差，网络只需学习将输入的激活映射到正确的残差，即图中的 $\mathcal F(\text x)$ ，而不是直接学习完整的映射关系。这种残差连接可以使得信息更容易地向前传播，从而减轻了梯度消失和梯度爆炸的问题，有助于训练非常深的网络。
在这里插入图片描述

ResNet 的典型结构是由多个残差块组成的深层网络，其中间还可能插入一些池化层或卷积层来调整特征图的尺寸和通道数。下面以 ResNet-18 为例：

在这里插入图片描述

1. 残差块

一个残差块由多个卷积层、批量归一化层等组成，可以分为包含和不包含 1×1 卷积层两种：
在这里插入图片描述

class Residual(nn.Module):
    def __init__(self, input_channels, num_channels, use_1x1conv=False, strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1, stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels, num_channels, kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

2. 模块化残差块

ResNet 中使用两个残差块组成一个更大的模块：

def resnet_block(input_channels, num_channels, num_residuals, first_block=False):
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(input_channels, num_channels, use_1x1conv=True, strides=2))
        else:
            blk.append(Residual(num_channels, num_channels))
    return blk

3. 实例化

按照 ResNet-18 网络结构实例化残差块：

在这里插入图片描述

b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))

net = nn.Sequential(
    b1, b2, b3, b4, b5,
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(), nn.Linear(512, 10)
)

4. 训练

训练过程同 LeNet，使用封装好的 train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device) 函数，解决 Fashion-MNIST 的分类问题：

import torch
from torch import nn
from torch.utils import data
import torchvision
from torchvision import transforms
from torch.nn import functional as F

"""下载Fashion-MNIST数据集并将其加载到内存中"""
def load_data_fashion_mnist(batch_size, resize=None):
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root="./data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(root="./data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True), data.DataLoader(mnist_test, batch_size, shuffle=False))
batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=96)


'''定义残差网络'''
class Residual(nn.Module):
    def __init__(self, input_channels, num_channels, use_1x1conv=False, strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1, stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels, num_channels, kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)
    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

def resnet_block(input_channels, num_channels, num_residuals, first_block=False):
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(input_channels, num_channels, use_1x1conv=True, strides=2))
        else:
            blk.append(Residual(num_channels, num_channels))
    return blk


'''初始化残差网络'''
b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))

net = nn.Sequential(
    b1, b2, b3, b4, b5,
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(), nn.Linear(512, 10)
)
def init_weights(m):
    if type(m) == nn.Linear or type(m) == nn.Conv2d:
        nn.init.xavier_uniform_(m.weight)
net.apply(init_weights)

optimizer = torch.optim.SGD(net.parameters(), lr=0.05)
loss = nn.CrossEntropyLoss()
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')


'''训练'''
def accuracy(y_hat, y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())
def evaluate_accuracy_gpu(net, data_iter, device):
    if isinstance(net, nn.Module):
        net.eval()          # 设置为评估模式
    test_acc_sum = 0.0      # 正确预测的数量
    test_sample_num = 0     # 总预测的数量
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(X, list):
                X = [x.to(device) for x in X]
            else:
                X = X.to(device)
            y = y.to(device)
            test_acc_sum += accuracy(net(X), y)
            test_sample_num += y.numel()
    return test_acc_sum / test_sample_num
def train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device):
	net.to(device)
    for epoch in range(num_epochs):
        train_loss_sum = 0.0            # 训练损失总和
        train_acc_sum = 0.0             # 训练准确度总和
        sample_num = 0                  # 样本数
        net.train()
        for i, (X, y) in enumerate(train_iter):
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.sum().backward()
            optimizer.step()
            with torch.no_grad():
                train_loss_sum += l.sum()
                train_acc_sum += accuracy(y_hat, y)
                sample_num += y.numel()
            train_loss = train_loss_sum / sample_num
            train_acc = train_acc_sum / sample_num
        test_acc = evaluate_accuracy_gpu(net, test_iter, device)
        print(f'loss {train_loss:.3f}, train acc {train_acc:.3f}, test acc {test_acc:.3f}')

print('---------------- training on', device, '----------------')
num_epochs = 10
train_net_gpu(net, train_iter, test_iter, loss, num_epochs, optimizer, device)
'''
---------------- training on cuda:0 ----------------
loss 0.002, train acc 0.830, test acc 0.807
loss 0.001, train acc 0.906, test acc 0.830
loss 0.001, train acc 0.930, test acc 0.847
loss 0.001, train acc 0.945, test acc 0.886
loss 0.000, train acc 0.962, test acc 0.896
loss 0.000, train acc 0.971, test acc 0.831
loss 0.000, train acc 0.982, test acc 0.872
loss 0.000, train acc 0.990, test acc 0.905
loss 0.000, train acc 0.993, test acc 0.894
loss 0.000, train acc 0.996, test acc 0.915
'''