PyTorch学习笔记

Yungang_Young

已于 2022-06-18 00:58:43 修改

阅读量329

点赞数

分类专栏： python 文章标签： pytorch python 深度学习

于 2022-06-15 02:18:10 首次发布

本文链接：https://blog.csdn.net/Yungang_Young/article/details/125288767

版权

python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

文章主要参考datawale组队学习- 深入浅出PyTorch

一、PyTorch基础知识

1.1 张量

1.1.1 简介

张量是基于向量和矩阵的推广，维度由低到高可以表示

0维张量–>标量（数字）
1维张量–>向量
2维张量–>矩阵
3维张量–>彩色图片（RGB）等

1.1.2 创建Tensor

import torch
# 构造随机初始化4*3的矩阵
x1 = torch.rand(4, 3)
# 构造矩阵全为0，dtype数据类型为long
x2 = torch.zeros(4, 3, dtype=torch.long)
# 构造矩阵全为1
x3 = torch.ones(4, 3)

print(x1)
# 获取维度信息
print(x1.size())
print(x1.shape)
print(x2)
print(x3)

输出：

tensor([[0.8275, 0.9382, 0.4360],
        [0.6193, 0.2424, 0.1184],
        [0.7378, 0.5127, 0.0595],
        [0.0323, 0.3910, 0.2825]])
torch.Size([4, 3])
torch.Size([4, 3])
tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

1.1.3 张量的操作

1.加法操作

# 加法操作1
print(x1 + x3)
# 加法操作2
print(torch.add(x1, x3))
# 加法操作3
result = torch.empty(4, 3)
torch.add(x1, x3, out=result)
print(result)

2.索引操作
索引出来的结果与原数据共享内存，修改一个，另一个会跟着修改。如果不想修改，可以考虑使用copy()等方法。

# 取第二列
print(result[:, 1])
# 改变观察角度
y1 = torch.rand(4, 4)
y2 = y1.view(16)
y3 = y1.view(-1, 8)  # -1是指这一维的维数由其他维度决定
print(y1.size(), y2.size(), y3.size())
print(y1)
print(y2)
print(y3)

输出：

tensor([1.2737, 1.0897, 1.6206, 1.8135])
torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])
tensor([[0.8934, 0.6314, 0.7211, 0.8276],
        [0.6205, 0.9730, 0.6983, 0.2075],
        [0.7568, 0.2464, 0.0478, 0.7884],
        [0.8389, 0.7170, 0.5309, 0.5620]])
tensor([0.8934, 0.6314, 0.7211, 0.8276, 0.6205, 0.9730, 0.6983, 0.2075, 0.7568,
        0.2464, 0.0478, 0.7884, 0.8389, 0.7170, 0.5309, 0.5620])
tensor([[0.8934, 0.6314, 0.7211, 0.8276, 0.6205, 0.9730, 0.6983, 0.2075],
        [0.7568, 0.2464, 0.0478, 0.7884, 0.8389, 0.7170, 0.5309, 0.5620]])

view操作获得的y2与y3，实际上还是与y1共享内存，顾名思义，仅仅只是改变了观察角度而已，Pytorch还提供了一个reshape() 可以改变形状，但是此函数并不能保证返回的是其拷贝，所以不推荐使用。推荐先用 clone 创造一个副本然后再使用 view 。

1.1.4 广播机制

当对两个形状不同的Tensor按元素运算时，可能会触发广播(broadcasting)机制：先适当复制元素使这两个Tensor形状相同后再按元素运算。

# 广播机制
# arange产生[1, 10)的数字
z1 = torch.arange(1, 10).view(3, 3)
z2 = torch.ones(3, 1)
print(z1)
print(z2)
print(z1 + z2)

输出：

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[1.],
        [1.],
        [1.]])
tensor([[ 2.,  3.,  4.],
        [ 5.,  6.,  7.],
        [ 8.,  9., 10.]])

可以看到，z2自动扩充为了(3, 3)再与z1相加。

1.2 自动求导

可以使用requires_grad=True用来追踪其计算历史，它将会追踪对于该张量的所有操作。每个张量都有一个.grad_fn属性，该属性引用了创建Tensor自身的Function(除非这个张量是用户手动创建的，即这个张量的grad_fn是 None )。

x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad_fn)  # None 因为x由用户创建
y = x**2
print(y)
print(y.grad_fn)  # 引用

输出：

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None
tensor([[1., 1.],
        [1., 1.]], grad_fn=<PowBackward0>)
<PowBackward0 object at 0x000001F24223AE48>

对y进行更多操作

z = y * y * 3
out = z.mean()
print(z)
print(out)

如果需要计算导数，可以在Tensor上调用 .backward()。如果 Tensor 是一个标量(即它包含一个元素的数据），则不需要为backward()指定任何参数，但是如果它有更多的元素，则需要指定一个gradient参数，该参数是形状匹配的张量。

# 反向传播
out.backward()
# d(out)/dx
print(x.grad)

输出：

tensor([[3., 3.],
        [3., 3.]])

grad在反向传播过程中是累加的，一般在反向传播之前需把梯度清零。

# 再来反向传播⼀一次，注意grad是累加的
out2 = x.sum()
out2.backward()
print(x.grad)

out3 = x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)

输出：

tensor([[4., 4.],
        [4., 4.]])
tensor([[1., 1.],
        [1., 1.]])

二、PyTorch的主要组成模块

2.1 基本配置

有以下几个超参数可以统一设置

batch size，每批处理的大小
lr，学习率
epochs，训练次数
GPU配置

关于GPU的设置，通常有两种方式

# 方案一：使用os.environ，这种情况如果使用GPU不需要设置
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

# 方案二：使用“device”，后续对要使用GPU的变量用.to(device)即可
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")

2.2 数据读入

PyTorch数据读入是通过Dataset+DataLoader的方式完成的，Dataset定义好数据的格式和数据变换形式，DataLoader用iterative的方式不断读入批次数据。
可以通过继承Dataset来实现自定义的灵活读取，主要包含三个函数

__init__: 用于向类中传入外部参数，同时定义样本集
__getitem__: 用于逐个读取样本集合中的元素，可以进行一定的变换，并将返回训练/验证所需的数据
__len__: 用于返回数据集的样本数

以实际的论文为例，摘自FGNN

# 定义数据
train_dataset = MultiSessionsGraph(cur_dir + '/datasets/' + opt.dataset, phrase='train')
test_dataset = MultiSessionsGraph(cur_dir + '/../datasets/' + opt.dataset, phrase='test')
# 读入数据
train_loader = DataLoader(train_dataset, batch_size=opt.batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=opt.batch_size, shuffle=False)

MultiSessionsGraph是自己定义的类，它继承于Dataset。

2.3 模型构建

Module 类是 nn 模块里提供的一个模型构造类，是所有神经⽹网络模块的基类，我们可以继承它来定义我们想要的模型。
1.不含参数的模型
以下MyLayer类通过继承 Module 类自定义了一个将输入减掉均值后输出的层，并将层的计算定义在了 forward 函数里。

import torch
from torch import nn

class MyLayer(nn.Module):
    def __init__(self, **kwargs):
    	# 调用MLP父类Block的构造函数来进行必要的初始化。这样在构造实例时还可以指定其他函数
        super(MyLayer, self).__init__(**kwargs)
    def forward(self, x):
        return x - x.mean()

测试，往该层喂入数据[1, 2, 3, 4, 5]，执行向前传播

layer = MyLayer()
res = layer(torch.tensor([1, 2, 3, 4, 5], dtype=torch.float))
print(res)

输出：

tensor([-2., -1.,  0.,  1.,  2.])

2.含模型参数的层
这里构造一个简单的多层感知机，带有模型参数隐藏层和输出层，进行前向传播后输出

class MLP(nn.Module):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        self.hidden = nn.Linear(784, 256)
        self.act = nn.ReLU()
        self.output = nn.Linear(256, 10)

    def forward(self, x):
        o = self.act(self.hidden(x))
        return self.output(o)

测试：

X = torch.rand(2, 784)
net = MLP()
print(net)
res = net(X)
print(res.size())
print(res)

输出：

MLP(
  (hidden): Linear(in_features=784, out_features=256, bias=True)
  (output): Linear(in_features=256, out_features=10, bias=True)
)
torch.Size([2, 10])
tensor([[-0.1206, -0.2996,  0.2516, -0.0525,  0.2212,  0.0093,  0.0931, -0.1848,
         -0.3433, -0.1671],
        [-0.1767, -0.4206,  0.4729, -0.0081,  0.2569, -0.2523,  0.1119, -0.1536,
         -0.1935,  0.0160]], grad_fn=<AddmmBackward0>)

2.4 模型初始化

三、PyTorch基础实战

以FashionMNIST时装分类为例，探究如何搭建简单的CNN来预测数据

import os
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

# 配置GPU
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
# device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")

# 设置超参数
batch_size = 256
num_workers = 0
lr = 1e-4
epochs = 20

# 数据读入和加载
image_size = 28
data_transform = transforms.Compose([
    transforms.ToPILImage,
    transforms.Resize(image_size),
    transforms.ToTensor
])

class FMDataset(Dataset):
    def __init__(self, df, transform=None):
        self.df = df
        self.transform = transform
        self.images = df.iloc[:, 1:].values.astype(np.uint8)
        self.labels = df.iloc[:, 0].values
    def __len__(self):
        return len(self.images)
    def __getitem__(self, idx):
        image = self.images[idx].reshape(28, 28, 1)
        label = int(self.labels[idx])
        if self.transform is not None:
            image = self.transform(image)
        else:
            image = torch.tensor(image/255, dtype=torch/float)
        label = torch.tensor(label, dtype=torch.long)
        return image, label

train_df = pd.read_csv("./FashionMNIST/fashion-mnist_train.csv")
test_df = pd.read_csv("./FashionMNIST/fashion-mnist_test.csv")
train_data = FMDataset(train_df, data_transform)
test_data = FMDataset(test_df, data_transform)

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_works=num_workers, drop_last=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=num_workers)

# CNN模型
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv = nn.Sequential(
            # 二维卷积 输入通道数为1，输出通道数为32
            nn.Conv2d(1, 32, 5),
            nn.ReLU,
            nn.MaxPool2d(2, stride=2),
            nn.Dropout(0.3),
            nn.Conv2d(32, 64, 5),
            nn.ReLU,
            nn.MaxPool2d(2, stride=2),
            nn.Dropout(0.3)
        )
        self.fc = nn.Sequential(
            nn.Linear(64*4*4, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.conv(x)
        # 改变观看角度
        x = x.view(-1, 64*4*4)
        x = self.fc(x)
        return x

model = Net()
model = model.cuda()

# 损失函数
criterion = nn.CrossEntropyLoss()

# 优化器
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 训练
def train(epoch):
    model.train()
    train_loss = 0
    for data, label in train_loader:
        data, label = data.cuda(), label.cuda()
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, label)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()*data.size(0)
    train_loss = train_loss/len(train_loader.dataset)
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))

# 验证
def val(epoch):
    model.eval()
    val_loss = 0
    gt_labels = []
    # 预测值
    pred_labels = []
    with torch.no_grad():
        for data, label in test_loader:
            data, label = data.cuda(), label.cuda()
            output = model(data)
            preds = torch.argmax(output, 1)
            gt_labels.append(label.cpu().data.numpy())
            pred_labels.append(preds.cpu().data.numpy())
            loss = criterion(output, label)
            val_loss += loss.item()*data.size(0)
    val_loss = val_loss/len(test_loader.dataset)
    gt_labels, pred_labels = np.concatenate(gt_labels), np.concatenate(pred_labels)
    acc = np.sum(gt_labels==pred_labels)/len(pred_labels)
    print('Epoch: {} \tValidation Loss: {:.6f}, Accuracy: {:6f}'.format(epoch, val_loss, acc))

for epoch in range(1, epochs+1):
    train(epoch)
    val(epoch)