D1 配置环境&Pytorch基础

STUffT

已于 2022-10-02 10:57:10 修改

阅读量406

点赞数 1

分类专栏：深入浅出Pytorch 文章标签： pytorch python 深度学习

于 2022-07-12 09:25:19 首次发布

本文链接：https://blog.csdn.net/qq_38869560/article/details/125735852

版权

深入浅出Pytorch 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

配置环境

创建虚拟环境： conda create -n datawhale python=3.7
激活虚拟环境： conda activate datawhale
查看显卡驱动： nvidia-smi
安装pytorch： conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
安装jupyter： pip install jupyter
打开jupyter： jupyter notebook
映射远程服务器端口到本地： ssh -L8888:localhost:8888 <用户名>@<服务器IP>
在本地打开localhost:8888, 新建ipynb文件

基础知识

1. 张量

几何中的张量：基于向量和矩阵的推广。
torch.Tensor：存储和变换数据的主要工具，类似与高维数组，提供GPU计算和自动求梯度等更多功能

1.1 创建Tensor

import torch
# Tensor():python类，生成单精度浮点类型的张量 tensor()python函数：拷贝data  
a = torch.Tensor([2, 2])
b = torch.tensor([2, 2])
# ones(sizes)，zeros(sizes)，eye(sizes)  
c = torch.ones([3, 3])
d = torch.eye(3)
# arange(s,e,step)从s到e，步长为step  
# range():包括end， arange():不包括end
e = torch.arange(1, 10, 2)
# linspace(s,e,steps)从s到e，均匀分成step份
f = torch.linspace(1, 10, 2)
# rand/randn(sizes) rand是[0,1)均匀分布；randn是服从N(0，1)的正态分布
g = torch.rand((2,2))
h = torch.randn((2,2))
# normal(mean,std) 正态分布(均值为mean，标准差是std)
i = torch.normal(0, 1,size=(4,))
# randperm(m) 随机排列
j = torch.randperm(4)

a,b,c,d,e,f,g,h,i,j

(tensor([2., 2.]),
 tensor([2, 2]),
 tensor([[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]),
 tensor([[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]]),
 tensor([1, 3, 5, 7, 9]),
 tensor([ 1., 10.]),
 tensor([[0.3860, 0.0850],
         [0.2545, 0.3328]]),
 tensor([[ 0.4365, -0.6578],
         [ 0.2962,  0.5276]]),
 tensor([-0.1834, -1.5991, -1.2541, -0.0764]),
 tensor([2, 3, 0, 1]))

1.2 操作Tensor

# 加法
print(g + h)
print(torch.add(g, h))
h.add_(g)
print(h)
# 索引
print(h[:, 1])
# 维度变换 torch.view(): 共享内存 torch。reshape():不共享内存
print(h.view(4,))
print(h.reshape(1,4))

tensor([[ 0.8225, -0.5728],
        [ 0.5507,  0.8604]])
tensor([[ 0.8225, -0.5728],
        [ 0.5507,  0.8604]])
tensor([[ 0.8225, -0.5728],
        [ 0.5507,  0.8604]])
tensor([-0.5728,  0.8604])
tensor([ 0.8225, -0.5728,  0.5507,  0.8604])
tensor([[ 0.8225, -0.5728,  0.5507,  0.8604]])

1.3 广播机制

x = torch.arange(1, 3).view(1, 2)
print(x)
y = torch.arange(1, 4).view(3, 1)
print(y)
print(x + y)

tensor([[1, 2]])
tensor([[1],
        [2],
        [3]])
tensor([[2, 3],
        [3, 4],
        [4, 5]])

2.自动求导

autograd的求导机制
Tensor → .requires_grad
Function → .grad_fn
梯度的反向传播（在 Tensor进行.backward() ）
Tensor不是标量需要指定一个gradient参数，该参数是形状匹配的张量
out 是一个标量时，out.backward()和 out.backward(torch.tensor(1.)) 等价

from __future__ import print_function
import torch

x = torch.ones(3, 3, requires_grad=True)
y = x ** 2
y, y.grad_fn

(tensor([[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]], grad_fn=<PowBackward0>),
 <PowBackward0 at 0x7f185b2a1550>)

out = y.sum()  # 转化y为标量out
out.backward()
print(x.grad)

out1 = x.sum()  
out1.backward()
print(x.grad)

x.grad.data.zero_()

out2 = x.sum()  
out2.backward()
print(x.grad)

tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])
tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]])
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

# y不是标量时
x = torch.ones(3, 3, requires_grad=True)
y = x ** 3
v = torch.ones([3,3],dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]])

# 将代码块包装在 with torch.no_grad(): 中，来阻止 autograd 跟踪设置了.requires_grad=True的张量的历史记录
print(x.requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)

True
False

# 修改 tensor 的数值，但不被 autograd 记录(即不会影响反向传播)， 可以对 tensor.data 进行操作
x = torch.ones(1,requires_grad=True)
print(x.data) # 还是一个tensor
print(x.data.requires_grad) # 但是已经是独立于计算图之外

y = 2 * x
x.data *= 100 # 只改变了值，不会记录在计算图，所以不会影响梯度传播

y.backward()
print(x) # 更改data的值也会影响tensor的值 
print(x.grad)

tensor([1.])
False
tensor([100.], requires_grad=True)
tensor([2.])

3. 并行计算

CPU设置：

设置在文件最开始部分

import os
os.environ["CUDA_VISIBLE_DEVICE"] = "2" # 设置默认的显卡

运行时

CUDA_VISBLE_DEVICE=0,1 python train.py # 使用0，1两块GPU

常见并行方法：

模型并行
数据并行

Pytorch中使用GPU加速模型非常简单，只要将模型和数据移动到GPU上。核心代码只有以下几行。

# 定义模型
... 
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device) # 移动模型到cuda
... 

#训练模型
...
features = features.to(device) # 移动数据到cuda
labels = labels.to(device) # 或者  labels = labels.cuda() if torch.cuda.is_available() else labels
...

如果要使用多个GPU训练模型，也非常简单。只需要在将模型设置为数据并行风格模型。则模型移动到GPU上之后，会在每一个GPU上拷贝一个副本，并把数据平分到各个GPU上进行训练。核心代码如下。

#  定义模型
... 
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model) # 包装为并行风格模型

# 训练模型
...
features = features.to(device) # 移动数据到cuda
labels = labels.to(device) # 或者 labels = labels.cuda() if torch.cuda.is_available() else labels
...