10 分钟理解 PyTorch 代码

最新推荐文章于 2025-08-11 15:19:16 发布

liuchengxu_

最新推荐文章于 2025-08-11 15:19:16 发布

阅读量4.2k

点赞数 3

分类专栏： Machine Learning Machine Learning 文章标签：深度学习 pytorch 框架神经网络

Machine Learning 同时被 2 个专栏收录

9 篇文章

订阅专栏

Machine Learning

8 篇文章

订阅专栏

PyTorch是一个深度学习框架，包括torch包、autograd包、nn包和optim包。torch提供GPU支持的张量运算，autograd自动计算梯度，nn包含神经网络层和损失函数，optim实现优化算法。通过构建计算图和定义网络层，可以轻松创建和训练神经网络。

本文译自: Understand PyTorch code in 10 minutes

PyTorch 是一个新的深度学习框架. 本文的内容基于 Justin Johnson 的教程, 如果想要有更多了解或有更多时间的话建议仔细研究一下.

PyTorch 主要包含 4 个包 (package):

torch: 一个通用性的数组库, 与 Numpy 类似, 当 tensor 类型被转化(torch.cuda.TensorFloat)的时候可以在 GPU 上进行计算.
torch.autograd: 一个用来构建计算图来自动求梯度的包.
torch.nn: 一个包含常见 layer 和 cost function 的神经网络库.
torch.optim: 包含像 SGD, Adam 这样常见优化算法的一个优化包.

导入工作

你可以像下面这样导入 PyTorch:

import torch # arrays on GPU
import torch.autograd as autograd #build a computational graph
import torch.nn as nn ## neural net library
import torch.nn.functional as F ## most non-linearities are here
import torch.optim as optim # optimization package

使用 torch array 代替 numpy ndarray -> 提供在 GPU 上的线性代数运算支持

PyTorch 提供了类似 Numpy array 的多维数组, 当数据类型可以被转化为 (torch.cuda.TensorFloat) 时就可以放到 GPU 上进行处理. 多维数组和它相关的一些函数都是通用的科学计算工具.

Torch for numpy users 可以查看它是如何与 numpy 相关联的:

# 2 matrices of size 2x3 into a 3d tensor 2x2x3
d=[[[1., 2.,3.],[4.,5.,6.]],[[7.,8.,9.],[11.,12.,13.]]]
d=torch.Tensor(d) # array from python list
print "shape of the tensor:",d.size()

# the first index is the depth
z=d[0]+d[1]
print "adding up the two matrices of the 3d tensor:",z

shape of the tensor: torch.Size([2, 2, 3])
adding up the two matrices of the 3d tensor:
  8  10  12
 15  17  19
[torch.FloatTensor of size 2x3]

# a heavily used operation is reshaping of tensors using .view()
print d.view(2,-1) #-1 makes torch infer the second dim

  1   2   3   4   5   6
  7   8   9  11  12  13
[torch.FloatTensor of size 2x6]

`torch.autograd` -> 创建一个计算图并自动计算梯度

第二个特性是 autograd 包, 它能够定义一个计算图以便于我们能够自动计算梯度. 在计算图中, 一个节点就是一个数组, 一条边就是在数组上的一个操作. 为了创建一个计算图, 我们需要在函数里面封装一个数组来创建一个节点 (torch.autograd.Variable()). 然后在该节点上的所有操作将会被定义为边, 操作的结果将会成为计算图中新的节点. 计算图中的每个节点都有一个 node.data 属性, 它是一个多维数组. 还有一个 node.grad 属性, 它是某个标量的梯度 (node.grad 同时也是一个 .Variable()). 在定义好图以后, 只需一个命令 (loss.backward()) 就可以计算图中所有节点的 loss 梯度.

使用 torch.autograd.Variable() 可以将一个 Tensor 转化成为计算图中一个节点.
- 通过 x.data 来获取它的值
- 通过 x.grad 来获取的梯度
在 .Variable() 上施加操作来生成图中的边

# d is a tensor not a node, to create a node based on it:
x= autograd.Variable(d, requires_grad=True)
print "the node's data is the tensor:", x.data.size()
print "the node's gradient is empty at creation:", x.grad # the grad is empty right now

the node's data is the tensor: torch.Size([2, 2, 3])
the node's gradient is empty at creation: None

# do operation on the node to make a computational graph
y= x+1
z=x+y
s=z.sum()
print s.creator

<torch.autograd._functions.reduce.Sum object at 0x7f1e59988790>

# calculate gradients
s.backward()
print "the variable now has gradients:",x.grad

the variable now has gradients: Variable containing:
(0 ,.,.) =
  2  2  2
  2  2  2

(1 ,.,.) =
  2  2  2
  2  2  2
[torch.FloatTensor of size 2x2x3]

`torch.nn` 包含了各种神经网络的 layer (对一个 tensor 行的线性映射) + (非线性) -> 无须手动控制 tensor 和参数即可构建一个神经网络计算图

第三个特性是一个高层次的神经网络库 (torch.nn), 它抽象出了神经网络的 layer 中所有的参数处理, 使得能够几个命令就可以定义一个神经网络 (比如, torch.nn.conv). 这个包同样也带有常用的 loss function (比如, torch.nn.MSEloss). 我们以定义一个模型容器开始, 比如使用 (torch.nn.Sequential) 有一系列层的模型, 并且按顺序列出我们想要的层. 这个库处理其他的所有事情; 我们可以通过 model.parameters() 来获取参数 (Variables()).

# linear transformation of a 2x5 matrix into a 2x3 matrix
linear_map=nn.Linear(5,3)
print "using randomly initialized params:", linear_map.parameters

using randomly initialized params: <bound method Linear.parameters of Linear (5 -> 3)>

# data has 2 examples with 5 features and 3 target
data=torch.randn(2,5) # training
y=autograd.Variable(torch.randn(2,3)) # target
# make a node
x=autograd.Variable(data, requires_grad=True)
# apply transformation to a node creates a computational graph
a=linear_map(x)
z=F.relu(a)
o=F.softmax(z)
print "output of softmax as a probability distribution:", o.data.view(1,-1)

# loss function
loss_func=nn.MSELoss() #instantiate loss function
L=loss_func(z,y) # calculateMSE loss between output and target
print "Loss:", L

output of softmax as a probability distribution:
 0.2092  0.1979  0.5929  0.4343  0.3038  0.2619
[torch.FloatTensor of size 1x6]

Loss: Variable containing:
 2.9838
[torch.FloatTensor of size 1]

我们也可以通过子集 torch.nn.Module 自定义 layer, 实现一个接受一个 Variable() 作为输入, 并输出一个 Variable() 的 forward() 函数. 我们也可以定义一个随时间变化的 layer 来创建一个动态网络!

当自定义一个 layer, 需要实现 2 个函数:
- 首先需要继承 init 函数, 然后 layer 中的所有参数必须被定义为类变量 (self.x)
- 在 forward 函数里面, 我们传递输入, 在输入上施加操作并进行输出. 输入需要时一个 autograd.Variable() 以便于 pytorch 能够构建 layer 的计算图.

class Log_reg_classifier(nn.Module):
    def __init__(self, in_size,out_size):
        super(Log_reg_classifier,self).__init__() #always call parent's init
        self.linear=nn.Linear(in_size, out_size) #layer parameters

    def forward(self,vect):
        return F.log_softmax(self.linear(vect)) #

`torch.optim` 可以进行优化 -> 我们通过 `torch.nn` 构建一个计算图, 通过 `torch.autograd` 来计算梯度, 然后输入到 `torch.optim` 来更新网络参数

第四个特性是一个配合 NN 库使用的优化包 (torch.optim). 这个库包含了一些像 Adam, RMSprop 的 optimizer. 我们定义一个 optimizer 并传入网络参数和学习率 (opt = torch.optim.Adam(model.parameters(), lr=learning_rate), 然后我们可以调用 opt.step() 对于我们的参数做一步更新.

optimizer=optim.SGD(linear_map.parameters(),lr=1e-2) # instantiate optimizer with model params + learning rate

# epoch loop: we run following until convergence
optimizer.zero_grad() # make gradients zero
L.backward(retain_variables=True)
optimizer.step()
print L

Variable containing:
 2.9838
[torch.FloatTensor of size 1]

构建一个神经网络十分容易, 下面是一个完整示例:

# define model
model = Log_reg_classifier(10,2)

# define loss function
loss_func=nn.MSELoss()

# define optimizer
optimizer=optim.SGD(model.parameters(),lr=1e-1)

# send data through model in minibatches for 10 epochs
for epoch in range(10):
    for minibatch, target in data:
        model.zero_grad() # pytorch accumulates gradients, making them zero for each minibatch

        #forward pass
        out=model(autograd.Variable(minibatch))

        #backward pass
        L=loss_func(out,target) #calculate loss
        L.backward() # calculate gradients
        optimizer.step() # make an update step