Pytorch学习笔记（2）基础概念

最新推荐文章于 2024-08-13 10:17:37 发布

重生之光头强下海当程序猿

最新推荐文章于 2024-08-13 10:17:37 发布

阅读量982

点赞数 17

分类专栏：机器学习-深度学习文章标签： pytorch 学习笔记

本文链接：https://blog.csdn.net/m0_50758340/article/details/140038654

版权

机器学习-深度学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

1.1 PyTorch 简介与安装

PyTorch 的诞生

2017 年 1 月，FAIR（Facebook AI Research）发布了 PyTorch。PyTorch 是在 Torch 基础上用 python 语言重新打造的一款深度学习框架。Torch 是采用 Lua 语言为接口的机器学习框架，但是因为 Lua 语言较为小众，导致 Torch 学习成本高，因此知名度不高。

PyTorch 的发展

2017 年 1 月正式发布 PyTorch。

2018 年 4 月更新 0.4.0 版，支持 Windows 系统，caffe2 正式并入 PyTorch。

2018 年 11 月更新 1.0 稳定版，已成为 Github 上增长第二快的开源项目。

2019 年 5 月更新 1.1.0 版，支持 TensorBoard，增强可视化功能。

2019 年 8 月更新 1.2.0 版，更新 Torchvision，torchaudio 和torchtext，支持更多功能。

目前 PyTorch 有超越 Tensorflow 的趋势。

…

PyTorch 优点

上手快，掌握 Numpy 和基本深度学习概念即可上手。

代码简洁灵活，使用 nn.Module 封装使得网络搭建更加方便。基于动态图机制，更加灵活。

资源多，arXiv 中新论文的算法大多有 PyTorch 实现。

开发者多，Github 上贡献者(Contributors)已经超过 1100+

…

PyTorch 实现模型训练的 5 大要素

数据：包括数据读取，数据清洗，进行数据划分和数据预处理，比如读取图片如何预处理及数据增强。

模型：包括构建模型模块，组织复杂网络，初始化网络参数，定义网络层。

损失函数：包括创建损失函数，设置损失函数超参数，根据不同任务选择合适的损失函数。

优化器：包括根据梯度使用某种优化器更新参数，管理模型参数，管理多个参数组实现不同学习率，调整学习率。

迭代训练：组织上面 4 个模块进行反复训练。包括观察训练效果，绘制 Loss/ Accuracy 曲线，用 TensorBoard 进行可视化分析。

整个系列的学习都会围绕着这 5 个方面来展开。

安装

在开发过程中可能会有多个项目同时进行，不同项目之间使用的 Python 版本和一些库的版本不一样，这就会导致冲突。因此这里使用 Anaconda 来管理多个 Python 虚拟环境。Anaconda 是为了方便使用 Python 而建立的一个软件包，包含常用的 250 多个工具包，多个版本的 Python 解释器和强大的虚拟环境管理工具。各个环境之间相互独立，可任意切换。

安装 Anaconda

到官网 https://www.anaconda.com/products/individual 选择适合自己系统的 64 位安装包，注意选择 Python3 以上的版本。

安装时记得勾选Add Anaconda to my PATH environment variable 添加到环境变量中。

安装完成后打开cmd，输入conda回车出现如下信息，即为安装成功。

接着添加中科大镜像或者清华镜像，在安装库的时候实现加速下载。

安装 PyTorch

检查是否有支持 CUDA 的 GPU，若有，需要安装 CUDA 和CuDNN。

进入 PyTorch 官方网站 https://pytorch.org/get-started/locally/选择自己需要安装的 PyTorch 对应的命令，在本地 cmd 中输入安装命令即可。这里我本机选择使用 conda 安装不支持 GPU 的 1.5 版本：conda install pytorch torchvision cpuonly -c pytorch。

如果 conda 或者 pip 安装很慢，也可以直接进入 https://download.pytorch.org/whl/torch_stable.html 下载 whl 包到本地安装。该页面包含了所有历史版本和各个平台的 PyTorch，需要根据文件名选择自己需要的版本进行下载。文件命名规则如下所示：

第一部分是 cuda 版本或者 cpu，第二部分是 PyTorch 版本号，第三部分是 Python 版本号，第四部分是操作系统。

cu92/torch-1.5.0%2Bcu92-cp37-cp37m-linux_x86_64.whl

Pycharm 使用 Anaconda 环境

在 Pycharm 中新建项目后，需要在 File -> Settings -> Project -> Python Interpreter 中选择 Anaconda 环境。

首先点击齿轮图标，点击弹出的add，

接着在弹出的窗口中选择Conda Environment，Conda execute 选择你安装好的 Anaconda 的文件夹下的Scripts\conda.exe。

最后在Python Interpreter给当前项目选择刚刚创建的 Python 环境即可。

如果安装的是 GPU 版本，安装完成之后可以使用print(torch.cuda.is_available())语句来查看安装好的 PyTorch 是否支持 GPU。这里我是用的是 CPU 版本。

1.2 Tensor(张量)介绍

本章代码：

https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson1/tensor_introduce1.py

https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson1/tensor_introduce1.py

Tensor 的概念

Tensor 中文为张量。张量的意思是一个多维数组，它是标量、向量、矩阵的高维扩展。

标量可以称为 0 维张量，向量可以称为 1 维张量，矩阵可以称为 2 维张量，RGB 图像可以表示 3 维张量。你可以把张量看作多维数组。

Tensor 与 Variable

在 PyTorch 0.4.0 之前，torch.autograd 包中存在 Variable 这种数据类型，主要是用于封装 Tensor，进行自动求导。Variable 主要包含下面几种属性。

data: 被包装的 Tensor。

grad: data 的梯度。

grad_fn: 创建 Tensor 所使用的 Function，是自动求导的关键，因为根据所记录的函数才能计算出导数。

requires_grad: 指示是否需要梯度，并不是所有的张量都需要计算梯度。

is_leaf: 指示是否叶子节点(张量)，叶子节点的概念在计算图中会用到，后面详细介绍。

在 PyTorch 0.4.0 之后，Variable 并入了 Tensor。在之后版本的 Tensor 中，除了具有上面 Variable 的 5 个属性，还有另外 3 个属性。

dtype: 张量的数据类型，如 torch.FloatTensor，torch.cuda.FloatTensor。

shape: 张量的形状。如 (64, 3, 224, 224)

device: 张量所在设备 (CPU/GPU)，GPU 是加速计算的关键

关于 dtype，PyTorch 提供了 9 种数据类型，共分为 3 大类：float (16-bit, 32-bit, 64-bit)、integer (unsigned-8-bit ,8-bit, 16-bit, 32-bit, 64-bit)、Boolean。模型参数和数据用的最多的类型是 float-32-bit。label 常用的类型是 integer-64-bit。

Tensor 创建的方法

直接创建 Tensor

torch.tensor()

torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)

data: 数据，可以是 list，numpy

dtype: 数据类型，默认与 data 的一致

device: 所在设备，cuda/cpu

requires_grad: 是否需要梯度

pin_memory: 是否存于锁页内存

代码示例：

arr = np.ones((3, 3))

print(“ndarray的数据类型：”, arr.dtype)

# 创建存放在 GPU 的数据

# t = torch.tensor(arr, device=‘cuda’)

t= torch.tensor(arr)

print(t)

输出为：

ndarray的数据类型： float64

tensor([[1., 1., 1.],

[1., 1., 1.],

[1., 1., 1.]], dtype=torch.float64)

torch.from_numpy(ndarray)

从 numpy 创建 tensor。利用这个方法创建的 tensor 和原来的 ndarray 共享内存，当修改其中一个数据，另外一个也会被改动。

代码示例：

arr = np.array([[1, 2, 3], [4, 5, 6]])

t = torch.from_numpy(arr)

# 修改 array，tensor 也会被修改

# print(“\n修改arr”)

# arr[0, 0] = 0

# print("numpy array: ", arr)

# print("tensor : ", t)

# 修改 tensor，array 也会被修改

print(“\n修改tensor”)

t[0, 0] = -1

print("numpy array: ", arr)

print("tensor : ", t)

输出为：

修改tensor

numpy array: [[-1 2 3]

[ 4 5 6]]

tensor : tensor([[-1, 2, 3],

[ 4, 5, 6]], dtype=torch.int32)

根据数值创建 Tensor

torch.zeros()

torch.zeros(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：根据 size 创建全 0 张量

size: 张量的形状

out: 输出的张量，如果指定了 out，那么torch.zeros()返回的张量和 out 指向的是同一个地址

layout: 内存中布局形式，有 strided，sparse_coo 等。当是稀疏矩阵时，设置为 sparse_coo 可以减少内存占用。

device: 所在设备，cuda/cpu

requires_grad: 是否需要梯度

代码示例：

out_t = torch.tensor([1])

# 这里制定了 out

t = torch.zeros((3, 3), out=out_t)

print(t, ‘\n’, out_t)

# id 是取内存地址。最终 t 和 out_t 是同一个内存地址

print(id(t), id(out_t), id(t) == id(out_t))

输出是：

tensor([[0, 0, 0],

[0, 0, 0],

[0, 0, 0]])

tensor([[0, 0, 0],

[0, 0, 0],

[0, 0, 0]])

2984903203072 2984903203072 True

torch.zeros_like

torch.zeros_like(input, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)

功能：根据 input 形状创建全 0 张量

input: 创建与 input 同形状的全 0 张量

dtype: 数据类型

layout: 内存中布局形式，有 strided，sparse_coo 等。当是稀疏矩阵时，设置为 sparse_coo 可以减少内存占用。

同理还有全 1 张量的创建方法：torch.ones()，torch.ones_like()。

torch.full()，torch.full_like()

torch.full(size, fill_value, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：创建自定义数值的张量

size: 张量的形状，如 (3,3)

fill_value: 张量中每一个元素的值

代码示例：

t = torch.full((3, 3), 1)

print(t)

输出为：

tensor([[1., 1., 1.],

[1., 1., 1.],

[1., 1., 1.]])

torch.arange()

torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：创建等差的 1 维张量。注意区间为[start, end)。

start: 数列起始值

end: 数列结束值，开区间，取不到结束值

step: 数列公差，默认为 1

代码示例：

t = torch.arange(2, 10, 2)

print(t)

输出为：

tensor([2, 4, 6, 8])

torch.linspace()

torch.linspace(start, end, steps=100, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：创建均分的 1 维张量。数值区间为 [start, end]

start: 数列起始值

end: 数列结束值

steps: 数列长度 (元素个数)

代码示例：

# t = torch.linspace(2, 10, 5)

t = torch.linspace(2, 10, 6)

print(t)

输出为：

tensor([ 2.0000, 3.6000, 5.2000, 6.8000, 8.4000, 10.0000])

torch.logspace()

torch.logspace(start, end, steps=100, base=10.0, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：创建对数均分的 1 维张量。数值区间为 [start, end]，底为 base。

start: 数列起始值

end: 数列结束值

steps: 数列长度 (元素个数)

base: 对数函数的底，默认为 10

代码示例：

# t = torch.logspace(2, 10, 5)

t = torch.logspace(2, 10, 6)

print(t)

输出为：

tensor([ 2.0000, 3.6000, 5.2000, 6.8000, 8.4000, 10.0000])

torch.eye()

torch.eye(n, m=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：创建单位对角矩阵( 2 维张量)，默认为方阵

n: 矩阵行数。通常只设置 n，为方阵。

m: 矩阵列数

根据概率创建 Tensor

torch.normal()

torch.normal(mean, std, *, generator=None, out=None)

功能：生成正态分布 (高斯分布)

mean: 均值

std: 标准差

有 4 种模式：

mean 为标量，std 为标量。这时需要设置 size。

代码示例：

# mean：标量 std: 标量

# 这里需要设置 size

t_normal = torch.normal(0., 1., size=(4,))

print(t_normal)

输出为：

tensor([0.6614, 0.2669, 0.0617, 0.6213])
mean 为标量，std 为张量
mean 为张量，std 为标量

代码示例：

# mean：张量 std: 标量

mean = torch.arange(1, 5, dtype=torch.float)

std = 1

t_normal = torch.normal(mean, std)

print(“mean:{}\nstd:{}”.format(mean, std))

print(t_normal)

输出为：

mean:tensor([1., 2., 3., 4.])

std:1

tensor([1.6614, 2.2669, 3.0617, 4.6213])

这 4 个数采样分布的均值不同，但是方差都是 1。
mean 为张量，std 为张量

代码示例：

# mean：张量 std: 张量

mean = torch.arange(1, 5, dtype=torch.float)

std = torch.arange(1, 5, dtype=torch.float)

t_normal = torch.normal(mean, std)

print(“mean:{}\nstd:{}”.format(mean, std))

print(t_normal)

输出为：

mean:tensor([1., 2., 3., 4.])

std:tensor([1., 2., 3., 4.])

tensor([1.6614, 2.5338, 3.1850, 6.4853])

其中 1.6614 是从正态分布 $N (1, 1)$ 中采样得到的，其他数字以此类推。

torch.randn() 和 torch.randn_like()

torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：生成标准正态分布。

size: 张量的形状

torch.rand() 和 torch.rand_like()

torch.rand(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：在区间 [0, 1) 上生成均匀分布。

torch.randint() 和 torch.randint_like()

randint(low=0, high, size, *, generator=None, out=None,

dtype=None, layout=torch.strided, device=None, requires_grad=False)

功能：在区间 [low, high) 上生成整数均匀分布。

size: 张量的形状

torch.randperm()

torch.randperm(n, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False)

功能：生成从 0 到 n-1 的随机排列。常用于生成索引。

n: 张量的长度

torch.bernoulli()

torch.bernoulli(input, *, generator=None, out=None)

功能：以 input 为概率，生成伯努利分布 (0-1 分布，两点分布)

input: 概率值

1.3 张量操作与线性回归

本章代码：https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson1/linear_regression.py

张量的操作

拼接

torch.cat()

torch.cat(tensors, dim=0, out=None)

功能：将张量按照 dim 维度进行拼接

tensors: 张量序列

dim: 要拼接的维度

代码示例：

t = torch.ones((2, 3))

t_0 = torch.cat([t, t], dim=0)

t_1 = torch.cat([t, t], dim=1)

print(“t_0:{} shape:{}\nt_1:{} shape:{}”.format(t_0, t_0.shape, t_1, t_1.shape))

输出是：

t_0:tensor([[1., 1., 1.],

[1., 1., 1.],

[1., 1., 1.]]) shape:torch.Size([4, 3])

t_1:tensor([[1., 1., 1., 1., 1., 1.],

[1., 1., 1., 1., 1., 1.]]) shape:torch.Size([2, 6])

torch.stack()

torch.stack(tensors, dim=0, out=None)

功能：将张量在新创建的 dim 维度上进行拼接

tensors: 张量序列

dim: 要拼接的维度

代码示例：

t = torch.ones((2, 3))

# dim =2

t_stack = torch.stack([t, t, t], dim=2)

print(“\nt_stack.shape:{}”.format(t_stack.shape))

# dim =0

t_stack = torch.stack([t, t, t], dim=0)

print(“\nt_stack.shape:{}”.format(t_stack.shape))

输出为：

t_stack.shape:torch.Size([2, 3, 3])

t_stack.shape:torch.Size([3, 2, 3])

第一次指定拼接的维度 dim =2，结果的维度是 [2, 3, 3]。后面指定拼接的维度 dim =0，由于原来的 tensor 已经有了维度 0，因此会把tensor 往后移动一个维度变为 [1,2,3]，再拼接变为 [3,2,3]。

切分

torch.chunk()

torch.chunk(input, chunks, dim=0)

功能：将张量按照维度 dim 进行平均切分。若不能整除，则最后一份张量小于其他张量。

input: 要切分的张量

chunks: 要切分的份数

dim: 要切分的维度

代码示例：

a = torch.ones((2, 7)) # 7

list_of_tensors = torch.chunk(a, dim=1, chunks=3) # 3

for idx, t in enumerate(list_of_tensors):

print(“第{}个张量：{}, shape is {}”.format(idx+1, t, t.shape))

输出为：

第1个张量：tensor([[1., 1., 1.],

[1., 1., 1.]]), shape is torch.Size([2, 3])

第2个张量：tensor([[1., 1., 1.],

[1., 1., 1.]]), shape is torch.Size([2, 3])

第3个张量：tensor([[1.],

[1.]]), shape is torch.Size([2, 1])

由于 7 不能整除 3，7/3 再向上取整是 3，因此前两个维度是 [2, 3]，所以最后一个切分的张量维度是 [2,1]。

torch.split()

torch.split(tensor, split_size_or_sections, dim=0)

功能：将张量按照维度 dim 进行平均切分。可以指定每一个分量的切分长度。

tensor: 要切分的张量

split_size_or_sections: 为 int 时，表示每一份的长度，如果不能被整除，则最后一份张量小于其他张量；为 list 时，按照 list 元素作为每一个分量的长度切分。如果 list 元素之和不等于切分维度 (dim) 的值，就会报错。

dim: 要切分的维度

代码示例：

t = torch.ones((2, 5))

list_of_tensors = torch.split(t, [2, 1, 2], dim=1)

for idx, t in enumerate(list_of_tensors):

print(“第{}个张量：{}, shape is {}”.format(idx+1, t, t.shape))

结果为：

第1个张量：tensor([[1., 1.],

[1., 1.]]), shape is torch.Size([2, 2])

第2个张量：tensor([[1.],

[1.]]), shape is torch.Size([2, 1])

第3个张量：tensor([[1., 1.],

[1., 1.]]), shape is torch.Size([2, 2])

索引

torch.index_select()

torch.index_select(input, dim, index, out=None)

功能：在维度 dim 上，按照 index 索引取出数据拼接为张量返回。

input: 要索引的张量

dim: 要索引的维度

index: 要索引数据的序号

代码示例：

# 创建均匀分布

t = torch.randint(0, 9, size=(3, 3))

# 注意 idx 的 dtype 不能指定为 torch.float

idx = torch.tensor([0, 2], dtype=torch.long)

# 取出第 0 行和第 2 行

t_select = torch.index_select(t, dim=0, index=idx)

print(“t:\n{}\nt_select:\n{}”.format(t, t_select))

输出为：

tensor([[4, 5, 0],

[5, 7, 1],

[2, 5, 8]])

t_select:

tensor([[4, 5, 0],

[2, 5, 8]])

torch.mask_select()

torch.masked_select(input, mask, out=None)

功能：按照 mask 中的 True 进行索引拼接得到一维张量返回。

要索引的张量

mask: 与 input 同形状的布尔类型张量

代码示例：

t = torch.randint(0, 9, size=(3, 3))

mask = t.le(5) # ge is mean greater than or equal/ gt: greater than le lt

# 取出大于 5 的数

t_select = torch.masked_select(t, mask)

print("t:\n{}\nmask:\n{}\nt_select:\n{} ".format(t, mask, t_select))

结果为：

tensor([[4, 5, 0],

[5, 7, 1],

[2, 5, 8]])

mask:

tensor([[ True, True, True],

[ True, False, True],

[ True, True, False]])

t_select:

tensor([4, 5, 0, 5, 1, 2, 5])

最后返回的是一维张量。

变换

torch.reshape()

torch.reshape(input, shape)

功能：变换张量的形状。当张量在内存中是连续时，返回的张量和原来的张量共享数据内存，改变一个变量时，另一个变量也会被改变。

input: 要变换的张量

shape: 新张量的形状

代码示例：

# 生成 0 到 8 的随机排列

t = torch.randperm(8)

# -1 表示这个维度是根据其他维度计算得出的

t_reshape = torch.reshape(t, (-1, 2, 2))

print(“t:{}\nt_reshape:\n{}”.format(t, t_reshape))

结果为：

t:tensor([5, 4, 2, 6, 7, 3, 1, 0])

t_reshape:

tensor([[[5, 4],

[2, 6]],

[[7, 3],

[1, 0]]])

在上面代码的基础上，修改原来的张量的一个元素，新张量也会被改变。

代码示例：

# 修改张量 t 的第 0 个元素，张量 t_reshape 也会被改变

t[0] = 1024

print(“t:{}\nt_reshape:\n{}”.format(t, t_reshape))

print(“t.data 内存地址:{}”.format(id(t.data)))

print(“t_reshape.data 内存地址:{}”.format(id(t_reshape.data)))

结果为：

t:tensor([1024, 4, 2, 6, 7, 3, 1, 0])

t_reshape:

tensor([[[1024, 4],

[ 2, 6]],

[[ 7, 3],

[ 1, 0]]])

t.data 内存地址:2636803119936

t_reshape.data 内存地址:2636803119792

torch.transpose()

torch.transpose(input, dim0, dim1)

功能：交换张量的两个维度。常用于图像的变换，比如把c*h*w变换为h*w*c。

input: 要交换的变量

dim0: 要交换的第一个维度

dim1: 要交换的第二个维度

代码示例：

#把 c * h * w 变换为 h * w * c

t = torch.rand((2, 3, 4))

t_transpose = torch.transpose(t, dim0=1, dim1=2) # chw hwc

print(“t shape:{}\nt_transpose shape: {}”.format(t.shape, t_transpose.shape))

结果为：

t shape:torch.Size([2, 3, 4])

t_transpose shape: torch.Size([2, 4, 3])

torch.t()

功能：2 维张量转置，对于 2 维矩阵而言，等价于torch.transpose(input, 0, 1)。

torch.squeeze()

torch.squeeze(input, dim=None, out=None)

功能：压缩长度为 1 的维度。

dim: 若为 None，则移除所有长度为 1 的维度；若指定维度，则当且仅当该维度长度为 1 时可以移除。

代码示例：

# 维度 0 和 3 的长度是 1

t = torch.rand((1, 2, 3, 1))

# 可以移除维度 0 和 3

t_sq = torch.squeeze(t)

# 可以移除维度 0

t_0 = torch.squeeze(t, dim=0)

# 不能移除 1

t_1 = torch.squeeze(t, dim=1)

print(“t.shape: {}”.format(t.shape))

print(“t_sq.shape: {}”.format(t_sq.shape))

print(“t_0.shape: {}”.format(t_0.shape))

print(“t_1.shape: {}”.format(t_1.shape))

结果为：

t.shape: torch.Size([1, 2, 3, 1])

t_sq.shape: torch.Size([2, 3])

t_0.shape: torch.Size([2, 3, 1])

t_1.shape: torch.Size([1, 2, 3, 1])

torch.unsqueeze()

torch.unsqueeze(input, dim)

功能：根据 dim 扩展维度，长度为 1。

张量的数学运算

主要分为 3 类：加减乘除，对数，指数，幂函数和三角函数。

这里介绍一下常用的几种方法。

torch.add()

torch.add(input, other, out=None)

torch.add(input, other, *, alpha=1, out=None)

功能：逐元素计算 input + alpha * other。因为在深度学习中经常用到先乘后加的操作。

input: 第一个张量

alpha: 乘项因子

other: 第二个张量

torch.addcdiv()

torch.addcdiv(input, tensor1, tensor2, *, value=1, out=None)

计算公式为：out $*{i}=\operatorname{input}*{i}+$ value $\times \frac{\text { tensor } 1*{i}}{\text { tensor } 2*{i}}$

torch.addcmul()

torch.addcmul(input, tensor1, tensor2, *, value=1, out=None)

计算公式为：out ${i}=$ input ${i}+$ value $\times$ tensor $\times$ tensor $2*{i}$

线性回归

线性回归是分析一个变量 ( $y$ ) 与另外一 (多) 个变量 ( $x$ ) 之间的关系的方法。一般可以写成 $y = w x + b$ 。线性回归的目的就是求解参数 $w, b$ 。

线性回归的求解可以分为 3 步：

确定模型： $y = w x + b$
选择损失函数，一般使用均方误差 MSE： $\frac{1}{m} \sum*{i=1}^{m}\left(y*{i}-\hat{y}*{i}\right)^{2}$ 。其中 $ \hat{y}*{i} $ 是预测值， $y$ 是真实值。
使用梯度下降法求解梯度 (其中 $l r$ 是学习率)，并更新参数：
$w = w - l r * w . g r a d$
$b = b - l r * b . g r a d$

代码如下：

import torch

import matplotlib.pyplot as plt

torch.manual_seed(10)

lr = 0.05 # 学习率

# 创建训练数据

x = torch.rand(20, 1) * 10 # x data (tensor), shape=(20, 1)

# torch.randn(20, 1) 用于添加噪声

y = 2*x + (5 + torch.randn(20, 1)) # y data (tensor), shape=(20, 1)

# 构建线性回归参数

w = torch.randn((1), requires_grad=True) # 设置梯度求解为 true

b = torch.zeros((1), requires_grad=True) # 设置梯度求解为 true

# 迭代训练 1000 次

for iteration in range(1000):

# 前向传播，计算预测值

wx = torch.mul(w, x)

y_pred = torch.add(wx, b)

# 计算 MSE loss

loss = (0.5 * (y - y_pred) ** 2).mean()

# 反向传播

loss.backward()

# 更新参数

b.data.sub_(lr * b.grad)

w.data.sub_(lr * w.grad)

# 每次更新参数之后，都要清零张量的梯度

w.grad.zero_()

b.grad.zero_()

# 绘图，每隔 20 次重新绘制直线

if iteration % 20 == 0:

plt.scatter(x.data.numpy(), y.data.numpy())

plt.plot(x.data.numpy(), y_pred.data.numpy(), ‘r-’, lw=5)

plt.text(2, 20, ‘Loss=%.4f’ % loss.data.numpy(), fontdict={‘size’: 20, ‘color’: ‘red’})

plt.xlim(1.5, 10)

plt.ylim(8, 28)

plt.title(“Iteration: {}\nw: {} b: {}”.format(iteration, w.data.numpy(), b.data.numpy()))

plt.pause(0.5)

# 如果 MSE 小于 1，则停止训练

if loss.data.numpy() < 1:

break

训练的直线的可视化如下：

在 80 次的时候，Loss 已经小于 1 了，因此停止了训练。

1.4 计算图与动态图机制

本章代码：https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson1/computational_graph.py

计算图

深度学习就是对张量进行一系列的操作，随着操作种类和数量的增多，会出现各种值得思考的问题。比如多个操作之间是否可以并行，如何协同底层的不同设备，如何避免冗余的操作，以实现最高效的计算效率，同时避免一些 bug。因此产生了计算图 (Computational Graph)。

计算图是用来描述运算的有向无环图，有两个主要元素：节点 (Node) 和边 (Edge)。节点表示数据，如向量、矩阵、张量。边表示运算，如加减乘除卷积等。

用计算图表示： $y = (x + w) * (w + 1)$ ，如下所示：

可以看作， $\times b$ ，其中 $a = x + w$ ， $b = w + 1$ 。

计算图与梯度求导

这里求 $y$ 对 $w$ 的导数。根复合函数的求导法则，可以得到如下过程。

$\begin{aligned} \frac{\partial y}{\partial w} &=\frac{\partial y}{\partial a} \frac{\partial a}{\partial w}+\frac{\partial y}{\partial b} \frac{\partial b}{\partial w} \ &=b *1+a* 1 \ &=b+a \ &=(w+1)+(x+w) \ &=2 *w+x+1 \ &=2* 1+2+1=5\end{aligned}$

体现到计算图中，就是根节点 $y$ 到叶子节点 $w$ 有两条路径 y -> a -> w和y ->b -> w。根节点依次对每条路径的孩子节点求导，一直到叶子节点w，最后把每条路径的导数相加即可。

代码如下：

import torch

w = torch.tensor([1.], requires_grad=True)

x = torch.tensor([2.], requires_grad=True)

# y=(x+w)*(w+1)

a = torch.add(w, x) # retain_grad()

b = torch.add(w, 1)

y = torch.mul(a, b)

# y 求导

y.backward()

# 打印 w 的梯度，就是 y 对 w 的导数

print(w.grad)

结果为tensor([5.])。

我们回顾前面说过的 Tensor 中有一个属性is_leaf标记是否为叶子节点。

在上面的例子中， $x$ 和 $w$ 是叶子节点，其他所有节点都依赖于叶子节点。叶子节点的概念主要是为了节省内存，在计算图中的一轮反向传播结束之后，非叶子节点的梯度是会被释放的。

代码示例：

# 查看叶子结点

print(“is_leaf:\n”, w.is_leaf, x.is_leaf, a.is_leaf, b.is_leaf, y.is_leaf)

# 查看梯度

print(“gradient:\n”, w.grad, x.grad, a.grad, b.grad, y.grad)

结果为：

is_leaf:

True True False False False

gradient:

tensor([5.]) tensor([2.]) None None None

非叶子节点的梯度为空，如果在反向传播结束之后仍然需要保留非叶子节点的梯度，可以对节点使用retain_grad()方法。

而 Tensor 中的 grad_fn 属性记录的是创建该张量时所用的方法 (函数)。而在反向传播求导梯度时需要用到该属性。

示例代码：

# 查看梯度

print("w.grad_fn = ", w.grad_fn)

print("x.grad_fn = ", x.grad_fn)

print("a.grad_fn = ", a.grad_fn)

print("b.grad_fn = ", b.grad_fn)

print("y.grad_fn = ", y.grad_fn)

结果为

w.grad_fn = None

x.grad_fn = None

a.grad_fn = <AddBackward0 object at 0x000001D8DDD20588>

b.grad_fn = <AddBackward0 object at 0x000001D8DDD20588>

y.grad_fn = <MulBackward0 object at 0x000001D8DDD20588>

PyTorch 的动态图机制

PyTorch 采用的是动态图机制 (Dynamic Computational Graph)，而 Tensorflow 采用的是静态图机制 (Static Computational Graph)。

动态图是运算和搭建同时进行，也就是可以先计算前面的节点的值，再根据这些值搭建后面的计算图。优点是灵活，易调节，易调试。PyTorch 里的很多写法跟其他 Python 库的代码的使用方法是完全一致的，没有任何额外的学习成本。

静态图是先搭建图，然后再输入数据进行运算。优点是高效，因为静态计算是通过先定义后运行的方式，之后再次运行的时候就不再需要重新构建计算图，所以速度会比动态图更快。但是不灵活。TensorFlow 每次运行的时候图都是一样的，是不能够改变的，所以不能直接使用 Python 的 while 循环语句，需要使用辅助函数 tf.while_loop 写成 TensorFlow 内部的形式。

1.5 autograd 与逻辑回归

本章代码：

https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson1/autograd.py

https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson1/logistic-regression.py

自动求导 (autograd)

在深度学习中，权值的更新是依赖于梯度的计算，因此梯度的计算是至关重要的。在 PyTorch 中，只需要搭建好前向计算图，然后利用torch.autograd自动求导得到所有张量的梯度。

torch.autograd.backward()

torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None)

功能：自动求取梯度

tensors: 用于求导的张量，如 loss

retain_graph: 保存计算图。PyTorch 采用动态图机制，默认每次反向传播之后都会释放计算图。这里设置为 True 可以不释放计算图。

create_graph: 创建导数计算图，用于高阶求导

grad_tensors: 多梯度权重。当有多个 loss 混合需要计算梯度时，设置每个 loss 的权重。

retain_graph 参数

代码示例

w = torch.tensor([1.], requires_grad=True)

x = torch.tensor([2.], requires_grad=True)

# y=(x+w)*(w+1)

a = torch.add(w, x)

b = torch.add(w, 1)

y = torch.mul(a, b)

# 第一次执行梯度求导

y.backward()

print(w.grad)

# 第二次执行梯度求导，出错

y.backward()

其中y.backward()方法调用的是torch.autograd.backward(self, gradient, retain_graph, create_graph)。但是在第二次执行y.backward()时会出错。因为 PyTorch 默认是每次求取梯度之后不保存计算图的，因此第二次求导梯度时，计算图已经不存在了。在第一次求梯度时使用y.backward(retain_graph=True)即可。如下代码所示：

w = torch.tensor([1.], requires_grad=True)

x = torch.tensor([2.], requires_grad=True)

# y=(x+w)*(w+1)

a = torch.add(w, x)

b = torch.add(w, 1)

y = torch.mul(a, b)

# 第一次求导，设置 retain_graph=True，保留计算图

y.backward(retain_graph=True)

print(w.grad)

# 第二次求导成功

y.backward()

grad_tensors 参数

代码示例：

w = torch.tensor([1.], requires_grad=True)

x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)

b = torch.add(w, 1)

y0 = torch.mul(a, b) # y0 = (x+w) * (w+1)

y1 = torch.add(a, b) # y1 = (x+w) + (w+1) dy1/dw = 2

# 把两个 loss 拼接都到一起

loss = torch.cat([y0, y1], dim=0) # [y0, y1]

# 设置两个 loss 的权重: y0 的权重是 1，y1 的权重是 2

grad_tensors = torch.tensor([1., 2.])

loss.backward(gradient=grad_tensors) # gradient 传入 torch.autograd.backward()中的grad_tensors

# 最终的 w 的导数由两部分组成。∂y0/∂w * 1 + ∂y1/∂w * 2

print(w.grad)

结果为：

tensor([9.])

该 loss 由两部分组成： $y*{0}$ 和 $y*{1}$ 。其中 $\frac{\partial y*{0}}{\partial w}=5$ ， $\frac{\partial y*{1}}{\partial w}=2$ 。而 grad*tensors 设置两个 loss 对 w 的权重分别为 1 和 2。因此最终 w 的梯度为： $\frac{\partial y*{0}}{\partial w} \times 1+ \frac{\partial y_{1}}{\partial w} \times 2=9$ 。

torch.autograd.grad()

torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)

功能：求取梯度。

outputs: 用于求导的张量，如 loss

inputs: 需要梯度的张量

create_graph: 创建导数计算图，用于高阶求导

retain_graph:保存计算图

grad_outputs: 多梯度权重计算

torch.autograd.grad()的返回结果是一个 tunple，需要取出第 0 个元素才是真正的梯度。

下面使用torch.autograd.grad()求二阶导，在求一阶导时，需要设置 create_graph=True，让一阶导数 grad_1 也拥有计算图，然后再使用一阶导求取二阶导：

x = torch.tensor([3.], requires_grad=True)

y = torch.pow(x, 2) # y = x**2

# 如果需要求 2 阶导，需要设置 create_graph=True，让一阶导数 grad_1 也拥有计算图

grad_1 = torch.autograd.grad(y, x, create_graph=True) # grad_1 = dy/dx = 2x = 2 * 3 = 6

print(grad_1)

# 这里求 2 阶导

grad_2 = torch.autograd.grad(grad_1[0], x) # grad_2 = d(dy/dx)/dx = d(2x)/dx = 2

print(grad_2)

输出为：

(tensor([6.], grad_fn=),)

(tensor([2.]),)

需要注意的 3 个点：

在每次反向传播求导时，计算的梯度不会自动清零。如果进行多次迭代计算梯度而没有清零，那么梯度会在前一次的基础上叠加。

代码示例：

w = torch.tensor([1.], requires_grad=True)

x = torch.tensor([2.], requires_grad=True)

# 进行 4 次反向传播求导，每次最后都没有清零

for i in range(4):

a = torch.add(w, x)

b = torch.add(w, 1)

y = torch.mul(a, b)

y.backward()

print(w.grad)

结构如下：

tensor([5.])

tensor([10.])

tensor([15.])

tensor([20.])

每一次的梯度都比上一次的梯度多 5，这是由于梯度不会自动清零。使用w.grad.zero_()将梯度清零。

for i in range(4):

a = torch.add(w, x)

b = torch.add(w, 1)

y = torch.mul(a, b)

y.backward()

print(w.grad)

# 每次都把梯度清零

# w.grad.zero_()

依赖于叶子节点的节点，requires_grad 属性默认为 True。

叶子节点不可执行 inplace 操作。

以加法来说，inplace 操作有a += x，a.add_(x)，改变后的值和原来的值内存地址是同一个。非inplace 操作有a = a + x，a.add(x)，改变后的值和原来的值内存地址不是同一个。

代码示例：

print(“非 inplace 操作”)

a = torch.ones((1, ))

print(id(a), a)

# 非 inplace 操作，内存地址不一样

a = a + torch.ones((1, ))

print(id(a), a)

print(“inplace 操作”)

a = torch.ones((1, ))

print(id(a), a)

# inplace 操作，内存地址一样

a += torch.ones((1, ))

print(id(a), a)

结果为：

非 inplace 操作

2404827089512 tensor([1.])

2404893170712 tensor([2.])

inplace 操作

2404827089512 tensor([1.])

2404827089512 tensor([2.])

如果在反向传播之前 inplace 改变了叶子的值，再执行 backward() 会报错

w = torch.tensor([1.], requires_grad=True)

x = torch.tensor([2.], requires_grad=True)

# y = (x + w) * (w + 1)

a = torch.add(w, x)

b = torch.add(w, 1)

y = torch.mul(a, b)

# 在反向传播之前 inplace 改变了 w 的值，再执行 backward() 会报错

w.add_(1)

y.backward()

这是因为在进行前向传播时，计算图中依赖于叶子节点的那些节点，会记录叶子节点的地址，在反向传播时就会利用叶子节点的地址所记录的值来计算梯度。比如在 $\times b$ ，其中 $a = x + w$ ， $b = w + 1$ ， $x$ 和 $w$ 是叶子节点。当求导 $\frac{\partial y}{\partial a} = b = w+1$ ，需要用到叶子节点 $w$ 。

逻辑回归 (Logistic Regression)

逻辑回归是线性的二分类模型。模型表达式 $y=f(z)=\frac{1}{1+e^{-z}}$ ，其中 $z = W X + b$ 。 $f (z)$ 称为 sigmoid 函数，也被称为 Logistic 函数。函数曲线如下：(横坐标是 $z$ ，而 $z = W X + b$ ，纵坐标是 $y$ )

分类原则如下：class $KaTeX parse error: Expected '}', got '\right' at position 63: …eq y\end{array}\̲r̲i̲g̲h̲t̲.$ 。当 $y < 0.5$ 时，类别为 0；当 $\leq y$ 时，类别为 1。

其中 $z = W X + b$ 就是原来的线性回归的模型。从横坐标来看，当 $z < 0$ 时，类别为 0；当 $\leq z$ 时，类别为 1，直接使用线性回归也可以进行分类。逻辑回归是在线性回归的基础上加入了一个 sigmoid 函数，这是为了更好地描述置信度，把输入映射到 (0,1) 区间中，符合概率取值。

逻辑回归也被称为对数几率回归 $\ln \frac{y}{1-y}=W X+b$ ，几率的表达式为： $\frac{y}{1-y}$ ， $y$ 表示正类别的概率， $1 - y$ 表示另一个类别的概率。根据对数几率回归可以推导出逻辑回归表达式：

$\ln \frac{y}{1-y}=W X+b$ $\frac{y}{1-y}=e^{W X+b}$ $y=e^{W X+b}-y * e^{W X+b}$ $y\left(1+e^{W X+b}\right)=e^{W X+b}$ $y=\frac{e^{W X+b}}{1+e^{W X+b}}=\frac{1}{1+e^{-(W X+b)}}$

PyTorch 实现逻辑回归

PyTorch 构建模型需要 5 大步骤：

数据：包括数据读取，数据清洗，进行数据划分和数据预处理，比如读取图片如何预处理及数据增强。

模型：包括构建模型模块，组织复杂网络，初始化网络参数，定义网络层。

损失函数：包括创建损失函数，设置损失函数超参数，根据不同任务选择合适的损失函数。

优化器：包括根据梯度使用某种优化器更新参数，管理模型参数，管理多个参数组实现不同学习率，调整学习率。

迭代训练：组织上面 4 个模块进行反复训练。包括观察训练效果，绘制 Loss/ Accuracy 曲线，用 TensorBoard 进行可视化分析。

代码示例：

import torch

import torch.nn as nn

import matplotlib.pyplot as plt

import numpy as np

torch.manual_seed(10)

# ============================ step 1/5 生成数据 ============================

sample_nums = 100

mean_value = 1.7

bias = 1

n_data = torch.ones(sample_nums, 2)

# 使用正态分布随机生成样本，均值为张量，方差为标量

x0 = torch.normal(mean_value * n_data, 1) + bias # 类别0 数据 shape=(100, 2)

# 生成对应标签

y0 = torch.zeros(sample_nums) # 类别0 标签 shape=(100, 1)

# 使用正态分布随机生成样本，均值为张量，方差为标量

x1 = torch.normal(-mean_value * n_data, 1) + bias # 类别1 数据 shape=(100, 2)

# 生成对应标签

y1 = torch.ones(sample_nums) # 类别1 标签 shape=(100, 1)

train_x = torch.cat((x0, x1), 0)

train_y = torch.cat((y0, y1), 0)

# ============================ step 2/5 选择模型 ============================

class LR(nn.Module):

def init(self):

super(LR, self).init()

self.features = nn.Linear(2, 1)

self.sigmoid = nn.Sigmoid()

def forward(self, x):

x = self.features(x)

x = self.sigmoid(x)

return x

lr_net = LR() # 实例化逻辑回归模型

# ============================ step 3/5 选择损失函数 ============================

loss_fn = nn.BCELoss()

# ============================ step 4/5 选择优化器 ============================

lr = 0.01 # 学习率

optimizer = torch.optim.SGD(lr_net.parameters(), lr=lr, momentum=0.9)

# ============================ step 5/5 模型训练 ============================

for iteration in range(1000):

# 前向传播

y_pred = lr_net(train_x)

# 计算 loss

loss = loss_fn(y_pred.squeeze(), train_y)

# 反向传播

loss.backward()

# 更新参数

optimizer.step()

# 清空梯度

optimizer.zero_grad()

# 绘图

if iteration % 20 == 0:

mask = y_pred.ge(0.5).float().squeeze() # 以0.5为阈值进行分类

correct = (mask == train_y).sum() # 计算正确预测的样本个数

acc = correct.item() / train_y.size(0) # 计算分类准确率

plt.scatter(x0.data.numpy()[:, 0], x0.data.numpy()[:, 1], c=‘r’, label=‘class 0’)

plt.scatter(x1.data.numpy()[:, 0], x1.data.numpy()[:, 1], c=‘b’, label=‘class 1’)

w0, w1 = lr_net.features.weight[0]

w0, w1 = float(w0.item()), float(w1.item())

plot_b = float(lr_net.features.bias[0].item())

plot_x = np.arange(-6, 6, 0.1)

plot_y = (-w0 * plot_x - plot_b) / w1

plt.xlim(-5, 7)

plt.ylim(-7, 7)

plt.plot(plot_x, plot_y)

plt.text(-5, 5, ‘Loss=%.4f’ % loss.data.numpy(), fontdict={‘size’: 20, ‘color’: ‘red’})

plt.title(“Iteration: {}\nw0:{:.2f} w1:{:.2f} b: {:.2f} accuracy:{:.2%}”.format(iteration, w0, w1, plot_b, acc))

plt.legend()

# plt.savefig(str(iteration / 20)+“.png”)

plt.show()

plt.pause(0.5)

# 如果准确率大于 99%，则停止训练

if acc > 0.99:

break

训练的分类直线的可视化如下：

重生之光头强下海当程序猿

关注

17
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
Pytorch学习笔记（2）基础概念

Pytorch学习笔记（2）基础概念
复制链接

扫一扫

专栏目录