ch01-PyTorch基础概念

古路

已于 2024-06-02 17:28:42 修改

阅读量767

点赞数 1

分类专栏： PyTorch 文章标签： pytorch 深度学习人工智能

于 2023-04-02 21:03:14 首次发布

本文链接：https://blog.csdn.net/fb_941219/article/details/129895471

版权

PyTorch 专栏收录该内容

10 篇文章 2 订阅

订阅专栏

0.引言

在这里插入图片描述

1.PyTorch简介

略.

在这里插入图片描述

图源见水印。

2.环境配置

参考链接.

1）验证成功

import torch
a=torch.ones(2,2)
a
tensor([[1., 1.],
        [1., 1.]])

查看pytorch版本

print ("hello pytorch {}".format(torch.__version__))

3）查看是否支持GPU

print (torch.cuda.is_available())

3.张量简介与创建

3.1.张量的概念：多维数组

在这里插入图片描述

Tensor与Variable：

参考1.
参考2
Variable：主要用于封装Tensor，进行自动求导，是torch.autograd中的数据类型。Variable是Pytorch的0.4.0版本之前的一个重要的数据结构，但是从0.4.0开始，它已经并入了Tensor中了。
data：被封装的Tensor
grad：data的梯度
grad_fn：创建Tensor的Function，是自动求导的关键
requires_grad：指示是否需要梯度
is_leaf：指示是否是叶子

从Pytorch0.4.0版本开始，Variable并入Tensor：
在这里插入图片描述

dtype: 张量的数据类型，三大类，共9种。torch.FloatTensor, torch.cuda.FloatTensor
shape: 张量的形状。如：（64，3，224，224）
decive: 所在设备

3.2.张量的创建

3.2.1.直接创建

1.torch.tensor()

torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)

功能：从data 创建 tensor

data : 数据 , 可以是 list, numpy
dtype : 数据类型，默认与 data 的一致
device 所在设备 , cuda cpu
requires_grad ：是否需要梯度
pin_memory ：是否存于锁页内存

import torch
import numpy as np

# Create tensors via torch.tensor

flag = True

if flag:
    arr = np.ones((3, 3))
    print("type of data:", arr.dtype)

    t = torch.tensor(arr, device='cuda')
    print(t)

type of data: float64
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], device='cuda:0', dtype=torch.float64)

其中，cuda表示采用了gpu，0是gpu的标号，由于只有一个gpu，因此是0。

2.从numpy创建tensor : torch.from_numpy(ndarray)
注意：共享内容。从 torch.from_numpy 创建的 tensor 于原 ndarray 共享内存，当修改其中一个的数据，另外一个也将会被改动。

# Create tensors via torch.from_numpy(ndarray)
arr = np.array([[1, 2, 3], [4, 5, 6]])
t = torch.from_numpy(arr)
print("numpy array: ", arr)
print("tensor : ", t)

print("\n修改arr")
arr[0, 0] = 0
print("numpy array: ", arr)
print("tensor : ", t)

print("\n修改tensor")
t[0, 0] = -1
print("numpy array: ", arr)
print("tensor : ", t)

numpy array:  [[1 2 3]
 [4 5 6]]
tensor :  tensor([[1, 2, 3],
        [4, 5, 6]], dtype=torch.int32)

修改arr
numpy array:  [[0 2 3]
 [4 5 6]]
tensor :  tensor([[0, 2, 3],
        [4, 5, 6]], dtype=torch.int32)

修改tensor
numpy array:  [[-1  2  3]
 [ 4  5  6]]
tensor :  tensor([[-1,  2,  3],
        [ 4,  5,  6]], dtype=torch.int32)

3.2.2.依据数值创建

1.torch.zeros()：按照size创建全0张量
- 功能：依size 创建全 0 张量
- size : 张量的形状 , 如 (3,3），(3，224，224）
- out : 输出的张量
- layout 内存中布局形式 , 有strided（默认）, sparse_coo（这个通常稀疏矩阵时设置，提高读取效率）等
- device 所在设备 , gpu cpu
- requires_grad ：是否需要梯度

torch.zeros(*size, out=None, dtype=None, 
	layout=torch.strided, device=None, requires_grad=False)

可见，该out的值与t相同，因此out是一个输出的作用，将张量生成的数据赋值给另一个变量。

2.torch.zeros_like()
- 功能：依据input 形状创建全 0 张量
- intput : 创建与 input 同形状的全 0 张量
- dtype : 数据类型
- layout 内存中布局形式

torch.zeros_like(input, dtype=None, layout=None, device=None, requires_grad=False)

3.torch.ones()
4.torch.ones_like()
5.torch.full()
6.torch.full_like()
- 功能：依据input 形状创建指定数据的张量
- size : 张量的形状 , 如 (3,3)
- fill_value : 张量的值

t = torch.full((3, 3), 5)
print(t)

tensor([[5, 5, 5],
        [5, 5, 5],
        [5, 5, 5]])

7.torch.arange()，创建等差数列，区间：[start, end)
- 功能：创建等差的1 维张量
- 注意事项：数值区间为[start,end）
- start : 数列起始值
- end : 数列“结束值”
- step : 数列公差，默认为 1

t = torch.arange(start=0, end=100, step=1, out=None, dtype=None, 
	layout=torch.strided, device=None, requires_grad=False)

t = torch.arange(2, 10, 2)
print(t)
# tensor([2, 4, 6, 8])

8.torch.linspace()，创建均分数列，区间：[start, end]
- 注意：step是步长；steps是长度
- 功能：创建均分的1 维张量
- start : 数列起始值，end : 数列结束值，steps : 数列长度，注意是长度。
- 它的步长就是（end - start）/ steps。

t = torch.linspace(start=0, end=100, steps=5, out=None, dtype=None, 
	layout=torch.strided, device=None, requires_grad=False)

t = torch.linspace(2, 10, 6)
print(t)

# tensor([ 2.0000,  3.6000,  5.2000,  6.8000,  8.4000, 10.0000])

9.torch.logspace()，创建对数均分的1维张量
- 注意：长度steps，底是base默认为10
- start : 数列起始值，end : 数列结束值，steps : 数列长度，base : 对数函数的底，默认为 10

t = torch.logspace(start=0, end=100, steps=5, base=10, out=None, 
	dtype=None, layout=torch.strided, device=None, requires_grad=False)

10.torch.eye()，创建单位对角矩阵（2维张量）
- 注意事项：默认为方阵，n: 矩阵行数； m：矩阵列

3.2.3.依概率分布创建张量

1.torch.normal()：生成正态分布（高斯分布），mean : 均值，std : 标准差
- 四种模式： mean为标量， std为标量； mean为标量， std为张量； mean为张量， std为标量； mean为张量， std为张量。后三种基本用法相同，都是根据不同的维数进行

torch.normal(mean, std, out=None)
torch.normal(mean, std, size, out=None)

# the mean and std both are tensors
mean = torch.arange(1, 5, dtype=torch.float)
std = torch.arange(1, 5, dtype=torch.float)
t_normal = torch.normal(mean, std)
print("mean:{}\nstd:{}".format(mean, std))
print(t_normal)
#######由结果可知，其生成的tensor是上面每一维度的参数生成的。
# mean:tensor([1., 2., 3., 4.])
# std:tensor([1., 2., 3., 4.])
# tensor([ 0.4750,  3.6384, -2.1488,  5.3180])

需要注意的是，对于mean和std都是标量的情况下，需要指定生成的size。

# mean: scalar std: scalar
t_normal = torch.normal(0., 1., size=(4,))
print(t_normal)
# tensor([0.6614, 0.2669, 0.0617, 0.6213])

2.torch.randn()
3.torch.randn_like(), 生成标准正态分布
- 注意：size指的是张量的形状
- 功能：生成标准正态分布（均值为0，方差为1）
- size : 张量的形状。

torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

4.torch.rand()
5.torch.rand_like() , 在区间[0, 1]上，生成均匀分布
6.torch.randint()
7.torch.randint_like()，在[low, high）生成整数均匀分布
- 功能：区间[low, high) 生成整数均匀分布
- size : 张量的形状
8.torch.randperm(), 生成从0–n-1的随机排列
- 功能：生成生成从0 到 n-1 的随机排列
- n : 张量的长度
9.torch.bernoulli()，生成伯努利分布
- 功能：以 input 为概率，生成伯努力分布（0 1 分布，两点分布）
- input : 概率值

4.张量操作与线性回归

4.1.张量的操作：拼接、切分、索引与变换

4.1.1.拼接

torch.cat(): 将张量按维度dim进行拼接
- 功能：将张量按维度dim进行拼接
- tensor：张量序列
- dim：拼接维度
torch.stack()：在新建的维度dim上进行拼接
- 功能：在新创建的维度dim上进行拼接
- tensor:张量序列
- dim:要拼接的维度

t = torch.ones((2, 3))

t_0 = torch.cat([t, t], dim=0)
t_1 = torch.stack([t, t], dim=0)

print(t_0)
print(t_0.shape)
print(t_1)
print(t_1.shape)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
torch.Size([4, 3])
tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 2, 3])

与cat相比，stack创建在了一个新维度

4.1.2.切分

torch.chunk(input, chunks, dim): 将张量按维度dim进行平均切分

功能：将张量按维度dim进行平均切分
返回值：张量列表
注意：若不能整除，最后一份张量小于其他张量
input:要切分的张量
chunks:要切分的份数
dim：要切分的维度

t = torch.ones((2, 7))
print(t)

list_of_tensor = torch.chunk(t, dim=1, chunks=3)
print(list_of_tensor)

tensor([[1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1.]])
(tensor([[1., 1., 1.],
        [1., 1., 1.]]), 
tensor([[1., 1., 1.],
        [1., 1., 1.]]),
tensor([[1.], [1.]]))

torch.split(): 将张量按维度dim进行切分

功能：将张量按维度dim进行切分
返回值：张量列表
split_size_or_sections:为int时，表示每一份的长度；为list时，按list元素切分

t = torch.ones((2, 7))
print(t)
list_of_tensor_2 = torch.split(t, 3, dim=1)
print(list_of_tensor_2)

list_of_tensor_3 = torch.split(t, [2, 2, 3], dim=1)
print(list_of_tensor_3)

tensor([[1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1.]])
(tensor([[1., 1., 1.],
        [1., 1., 1.]]), tensor([[1., 1., 1.],
        [1., 1., 1.]]), tensor([[1.],
        [1.]]))
(tensor([[1., 1.],
        [1., 1.]]), tensor([[1., 1.],
        [1., 1.]]), tensor([[1., 1., 1.],
        [1., 1., 1.]]))

list内元素之和等于维度上的长度

4.1.3.索引

torch.index_select(): 在维度dim上，按index索引数据

功能：在维度dim上，按index索引数据
返回值：依index索引数据拼接的张量

t = torch.randint(0, 9, (3, 3))
print(t)

# index_select
idx=torch.tensor([0,2],dtype=torch.long)
t_index_select=torch.index_select(t,index=idx,dim=0)
print(t_index_select)

tensor([[7, 1, 5],
        [1, 3, 4],
        [3, 4, 0]])
tensor([[7, 1, 5],
        [3, 4, 0]])

torch.masked_select(): 按mask中的True进行索引，返回一维张量。

功能：按mmask中的True进行索引
返回值：一维张量
input:要索引的张量
mask:与input同形状的布尔类型张量

t = torch.randint(0, 9, (3, 3))
print(t)

# masked_select
mask = t.ge(5)
print(mask)

t_masked_select = torch.masked_select(t, mask)
print(t_masked_select)

tensor([[3, 8, 1],
        [6, 4, 1],
        [4, 8, 2]])
tensor([[False,  True, False],
        [ True, False, False],
        [False,  True, False]])
tensor([8, 6, 8])

4.1.4.变换

orch.reshape()

功能：变换张量的形状
注意：当张量在内存中是连续时，新张量与input共享内存
input:要变换的张量
shape:新张量的形状

# torch.reshape
t = torch.randperm(8)
print(t)
t_reshape = torch.reshape(t, (2, 4))  # -1代表不关心
print(t_reshape)

tensor([2, 3, 1, 4, 0, 5, 7, 6])
tensor([[2, 3, 1, 4],
        [0, 5, 7, 6]])

torch.transpose(): 交换张量的两个维度

# torch.transpose
t = torch.rand((2, 3, 4))
print(t)
t_transpose = torch.transpose(t, dim0=1, dim1=2)
print(t_transpose)

tensor([[[0.5063, 0.6772, 0.8968, 0.4836],
         [0.0820, 0.5198, 0.1273, 0.1895],
         [0.3535, 0.9936, 0.7150, 0.4375]],

        [[0.7801, 0.9114, 0.2901, 0.7171],
         [0.0553, 0.9102, 0.4060, 0.4010],
         [0.1037, 0.1053, 0.7860, 0.4523]]])
tensor([[[0.5063, 0.0820, 0.3535],
         [0.6772, 0.5198, 0.9936],
         [0.8968, 0.1273, 0.7150],
         [0.4836, 0.1895, 0.4375]],

        [[0.7801, 0.0553, 0.1037],
         [0.9114, 0.9102, 0.1053],
         [0.2901, 0.4060, 0.7860],
         [0.7171, 0.4010, 0.4523]]])

torch.t(): 2维张量转置，对矩阵而言，等价于 torch.transpose(input, 0, 1)
torch.squeeze(): 压缩长度为1的维度(轴)
- 功能：压缩长度为1的维度（轴）
- dim:若为None,移除所有长度为1的轴；若指定维度，当且仅当该轴长度为1时，课被移除
```
# torch.squeeze
t=torch.rand((1,2,3,1))

t1=torch.squeeze(t)
print(t1.shape)

t2=torch.squeeze(t,dim=2)
print(t2.shape)
```
```
torch.Size([2, 3])
torch.Size([1, 2, 3, 1])
```
torch.unsqueeze(): 依据dim扩展维度
- 功能：依据dim扩展维度

4.2.张量的数学运算

在这里插入图片描述

这里重点演示一下加法这个函数，因为这个函数有一个小细节：torch.add(input, alpha=1, other, out=None)：逐元素计算input+alpha * other。注意人家这里有个 alpha，叫做乘项因子。类似权重的个东西。这个东西让计算变得更加简洁，比如线性回归我们知道有个 y = wx + b，在这里直接一行代码torch.add(b, w, x) 就搞定。

torch.add(): 逐元素计算 input+alpha×other

在这里插入图片描述

# torch.add
t0=torch.rand((3,3))
t1=torch.ones_like(t0)
print(t0)
print(t1)
t_add=torch.add(t0,10,t1)
print(t_add)

t_0:
tensor([[ 0.6614,  0.2669,  0.0617],
        [ 0.6213, -0.4519, -0.1661],
        [-1.5228,  0.3817, -1.0276]])
t_1:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
t_add_10:
tensor([[10.6614, 10.2669, 10.0617],
        [10.6213,  9.5481,  9.8339],
        [ 8.4772, 10.3817,  8.9724]])

-torch.addcdiv()

torch.addcmul()

4.3.线性回归

线性回归是分析一个变量与另外一(多)个变量之间关系的方法。因变量是 y，自变量是 x，关系线性：

任务就是求解 w，b。
在这里插入图片描述

在这里插入图片描述

我们的求解步骤：

1.确定模型：Model -> y = wx + b
2.选择损失函数：这里用 MSE ：
3.求解梯度并更新 w, b：

这就是我上面说的叫做代码逻辑的一种思路，写代码往往习惯先有一个这样的一种思路，然后再去写代码的时候，就比较容易了。而如果不系统的学一遍 Pytorch，一上来直接上那种复杂的 CNN， LSTM 这种，往往这些代码逻辑不好形成，因为好多细节我们根本就不知道。所以这次学习先从最简单的线性回归开始，然后慢慢的到复杂的那种网络。下面我们开始写一个线性回归模型：

# -*- coding:utf-8 -*-
"""
@file name  : lesson-03-Linear-Regression.py
@author     : TingsongYu https://github.com/TingsongYu
@date       : 2018-10-15
@brief      : 一元线性回归模型
"""
import torch
import matplotlib.pyplot as plt
torch.manual_seed(10)

lr = 0.05  # 学习率    20191015修改

# 创建训练数据
x = torch.rand(20, 1) * 10  # x data (tensor), shape=(20, 1)
y = 2*x + (5 + torch.randn(20, 1))  # y data (tensor), shape=(20, 1)

# 构建线性回归参数
w = torch.randn((1), requires_grad=True)
b = torch.zeros((1), requires_grad=True)

for iteration in range(1000):

    # 前向传播
    wx = torch.mul(w, x)
    y_pred = torch.add(wx, b)

    # 计算 MSE loss
    loss = (0.5 * (y - y_pred) ** 2).mean()

    # 反向传播
    loss.backward()

    # 更新参数
    b.data.sub_(lr * b.grad)
    w.data.sub_(lr * w.grad)

    # 清零张量的梯度   20191015增加
    w.grad.zero_()
    b.grad.zero_()

    # 绘图
    if iteration % 20 == 0:

        plt.scatter(x.data.numpy(), y.data.numpy())
        plt.plot(x.data.numpy(), y_pred.data.numpy(), 'r-', lw=5)
        plt.text(2, 20, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 20, 'color':  'red'})
        plt.xlim(1.5, 10)
        plt.ylim(8, 28)
        plt.title("Iteration: {}\nw: {} b: {}".format(iteration, w.data.numpy(), b.data.numpy()))
        plt.pause(1.5)

        if loss.data.numpy() < 1:
            break

今天的学习内容结束，下面简单的梳理一遍，其实小东西还是挺多的。

首先我们从 Pytorch 最基本的数据结构开始，认识了张量到底是个什么东西，说白了就是个多维数组，而张量本身有很多的属性，有关于数据本身的 data，dtype，shape，dtype，也有关于求导的 requires_grad，grad，grad_fn，is_leaf；
然后我们学习了张量的创建方法，比如直接创建，从数组创建，数值创建，按照概率创建等。这里面涉及到了很多的创建函数 tensor()，from_numpy()，ones()，zeros()，eye()，full()，arange()，linspace()，normal()，randn()，rand()，randint()，randperm() 等；
接着就是张量的操作部分，有基本操作和数学运算，基本操作部分有张量的拼接两个函数 (.cat, .stack)，张量的切分两个函数 (.chunk, .split)，张量的转置 (.reshape, .transpose, .t)，张量的索引两个函数 (.index_select， .masked_select)。数学运算部分，也是很多数学函数，有加减乘除的，指数底数幂函数的，三角函数的很多；
最后基于上面的所学完成了一个简单的线性回归。

这次整理了很多的函数，每个函数的用法不同，具体用法先不用刻意记住，先知道哪些函数具体完成什么功能，到时候用的时，边查边用，慢慢的多练才能熟。

5.计算图与动态图机制

5.1.计算图

深度学习就是对张量进行一系列的操作，随着操作种类和数量的增多，会出现各种值得思考的问题。比如多个操作之间是否可以并行，如何协同底层的不同设备，如何避免冗余的操作，以实现最高效的计算效率，同时避免一些 bug。因此产生了计算图 (Computational Graph)。

计算图是用来描述运算的有向无环图，有两个主要元素：节点 (Node) 和边 (Edge)。

节点表示数据，如向量、矩阵、张量。
边表示运算，如加减乘除卷积等。

用计算图表示：y=(x+w)*(w+1)，如下所示：

a = x + w
b = w + 1
y = a * b

在这里插入图片描述

可以看作， $\times b$ ，其中 $a = x + w ， b = w + 1$ 。

计算图与梯度求导:这里求 y 对 w 的导数。根复合函数的求导法则，可以得到如下过程。

$\begin{aligned} \frac{\partial y}{\partial w} & =\frac{\partial y}{\partial a} \frac{\partial a}{\partial w}+\frac{\partial y}{\partial b} \frac{\partial b}{\partial w} \\ & =b * 1+a * 1 \\ & =b+a \\ & =(w+1)+(x+w) \\ & =2 * w+x+1 \\ & =2 * 1+2+1=5 \end{aligned}$
在这里插入图片描述

体现到计算图中，就是根节点 y 到叶子节点 w 有两条路径 y -> a -> w和y ->b -> w。根节点依次对每条路径的孩子节点求导，一直到叶子节点w，最后把每条路径的导数相加即可。

代码如下：

import torch
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
# y=(x+w)*(w+1)
a = torch.add(w, x)     # retain_grad()
b = torch.add(w, 1)
y = torch.mul(a, b)
# y 求导
y.backward()
# 打印 w 的梯度，就是 y 对 w 的导数
print(w.grad)
# 结果为tensor([5.])

回顾前面说过的 Tensor 中有一个属性is_leaf标记是否为叶子节点。
在这里插入图片描述

在上面的例子中，x 和 w 是叶子节点，其他所有节点都依赖于叶子节点。叶子节点的概念主要是为了节省内存，在计算图中的一轮反向传播结束之后，非叶子节点的梯度是会被释放的。

叶子结点：用户创建的结点称为叶子结点，如 X 与 W
is_leaf: 指示张量是否为叶子结点
叶子节点的作用是标志存储叶子节点的梯度，而清除在反向传播过程中的变量的梯度，以达到节省内存的目的。当然，如果想要保存过程中变量的梯度值，可以采用retain_grad()
grad_fn: 记录创建该张量时所用的方法（函数）

代码示例：

# 查看叶子结点
print("is_leaf:\n", w.is_leaf, x.is_leaf, a.is_leaf, b.is_leaf, y.is_leaf)

# 查看梯度
print("gradient:\n", w.grad, x.grad, a.grad, b.grad, y.grad)

结果为：

is_leaf:
 True True False False False
gradient:
 tensor([5.]) tensor([2.]) None None None

非叶子节点的梯度为空，如果在反向传播结束之后仍然需要保留非叶子节点的梯度，可以对节点使用retain_grad()方法。

而 Tensor 中的 grad_fn 属性记录的是创建该张量时所用的方法 (函数)。而在反向传播求导梯度时需要用到该属性。

示例代码：

# 查看梯度
print("w.grad_fn = ", w.grad_fn)
print("x.grad_fn = ", x.grad_fn)
print("a.grad_fn = ", a.grad_fn)
print("b.grad_fn = ", b.grad_fn)
print("y.grad_fn = ", y.grad_fn)

结果为

w.grad_fn =  None
x.grad_fn =  None
a.grad_fn =  <AddBackward0 object at 0x000001D8DDD20588>
b.grad_fn =  <AddBackward0 object at 0x000001D8DDD20588>
y.grad_fn =  <MulBackward0 object at 0x000001D8DDD20588>

5.2.PyTorch 的动态图机制

根据计算图搭建方式，可将计算图分为动态图和静态图.

动态图 vs 静态图：

在这里插入图片描述

动态图:
- 运算与搭建同时进行
- 灵活易调节

例如动态图 PyTorch：

请添加图片描述

静态
- 先搭建图，后运算
- 高效不灵活。

静态图 TensorFlow:

请添加图片描述

PyTorch 采用的是动态图机制 (Dynamic Computational Graph)，而 Tensorflow 采用的是静态图机制 (Static Computational Graph)。

动态图是运算和搭建同时进行，也就是可以先计算前面的节点的值，再根据这些值搭建后面的计算图。优点是灵活，易调节，易调试。PyTorch 里的很多写法跟其他 Python 库的代码的使用方法是完全一致的，没有任何额外的学习成本。

静态图是先搭建图，然后再输入数据进行运算。优点是高效，因为静态计算是通过先定义后运行的方式，之后再次运行的时候就不再需要重新构建计算图，所以速度会比动态图更快。但是不灵活。TensorFlow 每次运行的时候图都是一样的，是不能够改变的，所以不能直接使用 Python 的 while 循环语句，需要使用辅助函数 tf.while_loop 写成 TensorFlow 内部的形式。

6.autograd与逻辑回归

本节课主要分为两部分：PyTorch 中的自动求导系统以及逻辑回归模型。我们知道，深度模型的训练就是不断地更新权值，而权值的更新需要求解梯度，因此，梯度在我们的模型训练过程中是至关重要的。然而，求解梯度通常十分繁琐，因此，PyTorch 中引入了自动求导系统帮助我们完成这一过程。在 PyTorch 中，我们无需手动计算梯度，只需要搭建好前向传播的计算图，然后根据 PyTorch 中的 autograd 方法就可以得到所有张量的梯度。

6.1.autograd 自动求导系统

torch.autograd.backward()
- 功能：自动求取计算图中各结点的梯度。

torch.autograd.backward(
    tensors,
    grad_tensors=None,
    retain_graph=None,
    create_graph=False
)

主要参数：

tensors：用于求导的张量，如 loss。
retain_graph：保存计算图，PyTorch 默认在反向传播完成后丢弃计算图，如需保存则将该项设为 True。
create_graph：创建导数计算图，用于高阶求导。
grad_tensors：多梯度权重，当我们有多个 loss 需要计算梯度的时候，就需要设置各个 loss 的权重比例。

回顾一下如何通过计算图求解梯度：

$y = (x + w) * (w + 1)$

$a = x + w$
$b = w + 1$
$y = a * b$
$\begin{aligned} \frac{\partial y}{\partial w} & =\frac{\partial y}{\partial a} \frac{\partial a}{\partial w}+\frac{\partial y}{\partial b} \frac{\partial b}{\partial w} \\ & =b * 1+a * 1 \\ & =(w+1)+(x+w) \\ & =2 * w+x+1 \\ & =2 * 1+2+1=5 \end{aligned}$

代码示例：

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)
y = torch.mul(a, b)

# 如果希望后面再次执行该计算图，可以将 retain_graph 参数设为 True
# y.backward(retain_graph=True) 

y.backward()
print(w.grad)

输出结果：

tensor([5.])

当有多个 loss 需要计算梯度时，通过 grad_tensors 设置各 loss 权重比例：

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)

# y0 = (x+w) * (w+1)    dy0/dw = 2*w + x + 1 = 5
y0 = torch.mul(a, b)

# y1 = (x+w) + (w+1)    dy1/dw = 2
y1 = torch.add(a, b)  

# 这种情况下，loss 是一个向量 [y0, y1]
loss = torch.cat([y0, y1], dim=0)

# 梯度的权重：dy0/dw 权重为 1，dy1/dw 权重为 2
grad_tensors = torch.tensor([1., 2.])

# gradient 传入 torch.autograd.backward() 中的 grad_tensors
loss.backward(gradient=grad_tensors)  

print(w.grad) # 5*1 + 2*2 = 9

输出结果：

tensor([9.])

torch.autograd.grad()
- 功能：求取梯度。

torch.autograd.grad(
    outputs,
    inputs,
    grad_outputs=None,
    retain_graph=None,
    create_graph=False
)

主要参数：

outputs：用于求导的张量，如 loss。
inputs：需要梯度的张量。
create_graph：创建导数计算图，用于高阶求导。
retain_graph：保存计算图。
grad_outputs：多梯度权重。

求取二阶梯度：

x = torch.tensor([3.], requires_grad=True)
y = torch.pow(x, 2)  # y = x**2

# grad_1 = dy/dx = 2x = 2 * 3 = 6
grad_1 = torch.autograd.grad(y, x, create_graph=True)  
print(grad_1)

# grad_2 = d(dy/dx)/dx = d(2x)/dx = 2
grad_2 = torch.autograd.grad(grad_1[0], x)  
print(grad_2)

输出结果：

(tensor([6.], grad_fn=<MulBackward0>),)
(tensor([2.]),)

注意事项：

梯度不自动清零。
依赖于叶子结点的结点，requires_grad 默认为 True。
叶子结点不可执行原位操作 (in-place)。

代码示例 1：

# 1. 梯度不会自动清零，重复求取会叠加，可以使用 .grad.zero_() 方法手动清零
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

for i in range(3):
    a = torch.add(w, x)
    b = torch.add(w, 1)
    y = torch.mul(a, b)

    y.backward()
    print(w.grad)

# 梯度清零，下划线表示原位操作 (in-place)
w.grad.zero_()

for i in range(3):
    a = torch.add(w, x)
    b = torch.add(w, 1)
    y = torch.mul(a, b)

    y.backward()
    print(w.grad)
    w.grad.zero_()

输出结果：

tensor([5.])
tensor([10.])
tensor([15.])
tensor([5.])
tensor([5.])
tensor([5.])

代码示例 2：

# 2. 依赖于叶子结点的结点， requires_grad 默认为 True
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)
y = torch.mul(a, b)

print(a.requires_grad, b.requires_grad, y.requires_grad)

输出结果：

True True True

代码示例 3：

# 3. 叶子结点不可执行 in-place (原位操作)。因为 PyTorch 计算图中引用叶子结点的值是
#    直接引用其前向传播时的地址，为了防止计算出错，叶子结点不可执行 in-place 操作。

#    in-place (原位操作): 从原始内存地址中直接改变数据。
#    非 in-place 操作: 开辟一块新的内存地址存储改变后的数据。

a = torch.ones((1, ))
print(id(a), a)

# 非 in-place 操作
a = a + torch.ones((1, ))
print(id(a), a)

# in-place 操作
a += torch.ones((1, ))
print(id(a), a)

输出结果：

4875211904 tensor([1.])
4875212336 tensor([2.])
4875212336 tensor([3.])

对叶子结点执行 in-place 操作将导致报错：

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)
y = torch.mul(a, b)

# 对非叶子结点 a 执行非 in-place 操作
print(a.add(1))

# 对非叶子结点 a 执行 in-place 操作
print(a.add_(1))

# 对叶子结点 w 执行非 in-place 操作
print(w.add(1))

# 对叶子结点 w 执行 in-place 操作，会报错
print(w.add_(1))

y.backward()

输出结果：

tensor([4.], grad_fn=<AddBackward0>)
tensor([4.], grad_fn=<AddBackward0>)
tensor([2.], grad_fn=<AddBackward0>)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/andy/PycharmProjects/hello_pytorch/lesson/lesson-05/lesson-05-autograd.py", line 145, in <module>
    print(w.add_(1))
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

6.2. 逻辑回归

逻辑回归 (Logistic Regression) 是线性的二分类模型。
模型表达式：
$\begin{aligned} & y=f(W X+b) \\ & f(x)=\frac{1}{1+e^{-x}} \end{aligned}$
即:
$y=\frac{1}{1+e^{-(W X+b)}}$
这里, 我们将 $f (x)$ 称为 Sigmoid 函数, 又称 Logistic 函数:
$\text { class }= \begin{cases}0, & y<0.5 \\ 1, & y \geq 0.5\end{cases}$

在这里插入图片描述

线性回归是分析自变量 $x$ 与因变量 $y$ (标量) 之间关系的方法；而逻辑回归是分析自变量 $x$ 与因变量 $y$ (概率) 之间关系的方法。

在这里插入图片描述

机器学习训练的 5 个步骤：

在这里插入图片描述

1.数据：数据收集、清洗、划分、预处理。
2.模型：根据任务的难易程度，选择简单的线性模型或者复杂的神经网络模型等等。
3.损失函数：根据不同任务选择不同的损失函数并计算其梯度。例如：在线性回归中，我们可以选择均方误差损失函数；在分类任务中，我们可以选择交叉熵损失函数。
4.优化器：得到梯度之后，我们选择某种优化器来更新权值。
5.迭代训练：有了数据、模型、损失函数和优化器之后，我们就可以进行迭代训练了。

代码示例：

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
torch.manual_seed(10)

# ============================== Step 1/5: 生成数据 ===================================
sample_nums = 100
mean_value = 1.7
bias = 1
n_data = torch.ones(sample_nums, 2)
x0 = torch.normal(mean_value * n_data, 1) + bias    # 类别0 数据 shape=(100, 2)
y0 = torch.zeros(sample_nums)                       # 类别0 标签 shape=(100, 1)
x1 = torch.normal(-mean_value * n_data, 1) + bias   # 类别1 数据 shape=(100, 2)
y1 = torch.ones(sample_nums)                        # 类别1 标签 shape=(100, 1)
train_x = torch.cat((x0, x1), 0)
train_y = torch.cat((y0, y1), 0)


# ============================== Step 2/5: 选择模型 ===================================
class LR(nn.Module):
    def __init__(self):
        super(LR, self).__init__()
        self.features = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.features(x)
        x = self.sigmoid(x)
        return x


lr_net = LR()   # 实例化逻辑回归模型

# ============================== Step 3/5: 选择损失函数 ================================
loss_fn = nn.BCELoss()  # 二分类交叉熵损失 Binary Cross Entropy Loss

# ============================== Step 4/5: 选择优化器 ==================================
lr = 0.01   # 学习率
optimizer = torch.optim.SGD(lr_net.parameters(), lr=lr, momentum=0.9)    # 随机梯度下降

# ============================== Step 5/5: 模型训练 ====================================
for iteration in range(1000):

    # 前向传播
    y_pred = lr_net(train_x)

    # 计算 loss
    loss = loss_fn(y_pred.squeeze(), train_y)

    # 反向传播
    loss.backward()

    # 更新参数
    optimizer.step()

    # 绘图
    if iteration % 20 == 0:

        mask = y_pred.ge(0.5).float().squeeze()  # 以 0.5 为阈值进行分类
        correct = (mask == train_y).sum()   # 计算正确预测的样本个数
        acc = correct.item() / train_y.size(0)   # 计算分类准确率

        plt.scatter(x0.data.numpy()[:, 0], x0.data.numpy()[:, 1], c='r', label='class 0')
        plt.scatter(x1.data.numpy()[:, 0], x1.data.numpy()[:, 1], c='b', label='class 1')

        w0, w1 = lr_net.features.weight[0]
        w0, w1 = float(w0.item()), float(w1.item())
        plot_b = float(lr_net.features.bias[0].item())
        plot_x = np.arange(-6, 6, 0.1)
        plot_y = (-w0 * plot_x - plot_b) / w1

        plt.xlim(-5, 7)
        plt.ylim(-7, 7)
        plt.plot(plot_x, plot_y)

        plt.text(-5, 5, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 20, 'color': 'red'})
        plt.title("Iteration: {}\nw0:{:.2f} w1:{:.2f} b:{:.2f} accuracy:{:.2%}".format(iteration, w0, w1, plot_b, acc))
        plt.legend()

        plt.show()
        plt.pause(0.5)

        if acc > 0.99:
            break

6.3. 总结

本节课介绍了 PyTorch 自动求导系统中的 torch.autograd.backward 和 torch.autograd.grad 这两个常用方法，并演示了一阶、二阶导数的求导过程；理解了自动求导系统，以及数据载体 —— 张量，前向传播构建计算图，计算图求取梯度过程。有了这些知识之后，我们就可以开始正式训练机器学习模型。这里通过演示逻辑回归模型的训练，学习了机器学习回归模型的五大模块：数据、模型、损失函数、优化器和迭代训练过程。这五大模块将是后面学习的主线。

在这里插入图片描述