秃姐学AI系列之：使用GPU-CSDN博客

本文链接：https://blog.csdn.net/m0_62415132/article/details/141175816

计算设备

查询可用GPU的数量

这两个函数允许我们在请求的GPU不存在的情况下运行代码

查询张量所在的设备

张量计算

神经网络与GPU

使用GPU

计算设备

所有设备默认都是在cpu上面做运算的，你需要去指定在gpu上运算

import torch
from torch import nn

# device('cpu')：指默认在cpu计算
# torch.cuda.device('cuda')：表示使用第0个GPU，虽然没写0
# torch.cuda.device('cuda:1')：访问第二个GPU
torch.device('cpu'), torch.cuda.device('cuda'), torch.cuda.device('cuda:1')

# 输出
(device(type = 'cpu'))
<torch.cuda.device at 0x7f72d4685190>
<torch.cuda.device at 0x7f72d4618d00>

查询可用GPU的数量

torch.cuda.device_count()

# 输出
1

这两个函数允许我们在请求的GPU不存在的情况下运行代码

def try_gpu(i = 0):
    """如果存在，则返回gpu(i)，否则返回cpu()"""
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda : {i}')
    return torch.devic('cpu')

def try_all_gpus():
    """返回所有可用的GPU，如果没有GPU，则返回[cpu(),]"""
    devices = [
        torch.device(f'cuda : {i}') for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]

try_gpu(), try_gpu(10), try_all_gpus()

# 输出
(device(type = 'cuda', index = 0),
 device(type = 'cpu'),
 [device(type = 'cuda', index = 0),
  device(type = 'cuda', index = 1),
  device(type = 'cuda', index = 2)])

查询张量所在的设备

下面代码可以看出，随便创建一个张量，默认是在cpu内存上

x = torch.tensor([1, 2, 3])
x.device  # 查询张量所在设备

# 输出
device(type = 'cpu')

如果我们想要储存在GPU上，需要在创建的时候定义一个device来设置储存在GPU上

X = torch.ones(2, 3, device = try_gpu())
X

# 输出
tensor([[1., 1., 1.],
        [1., 1., 1.]], device = 'cuda : 0')

在第二张GPU上创建一个随机张量

Y = torch.rand(2, 3, device = try_gpu(1))
Y

# 输出
tensor([[0.2442, 0.8953, 0.4599],
        [0.0960, 0.4853, 0.7821]], device = 'cuda : 1')

张量计算

当我们需要进行张量计算的时候，我们需要决定在哪里执行这个操作，在哪执行这个操作，计算出来的张量就会被储存在哪个GPU

Z = X.cuda(1)    # 创建一个Z，把X挪到第二个GPU上
print(X)
print(Z)

# 输出
tensor([[1., 1., 1.],
        [1., 1., 1.]], device = 'cuda : 0')
tensor([[1., 1., 1.],
        [1., 1., 1.]], device = 'cuda : 1')

现在数据都在同一个GPU上（Z和Y都在第二张GPU），我们可以对它两个进行相加，加法就会在第二张GPU上进行

Y + Z

# 输出
tensor([[1.2442, 1.8953, 1.4599],
        [1.0960, 1.4853, 1.7821]], device = 'cuda : 1')

实现上完全没问题。但是在实际运行中，在GPU之间挪数据，特别是从GPU挪到CPU是一件很慢的事情。所以实际中我们其实并不会这么干！

Z.cuda(1) is Z

# 输出
True

上面这段代码的意思：当Z已经在GPU：1上时再运行 Z.cuda(1) ，不会发生从 1 copy一遍的情况，什么都不会发生，Z还是它自己

神经网络与GPU

可以使用to()来把整个网络挪到某个GPU上面去

意味着将我们的所有参数在0号GPU上面copy一份，而X也在0号GPU，所以运算就是在0号GPU上面运算

net = nn.Sequential(nn.Linear(3, 1))
net = new.to(device = try_gpu())

net(X)

# 输出
tensor([[0.6014],
        [0.6014]], device = 'cuda : 0', grad_fn = <AddmmBackward>)

我们可以确认一下模型参数存储在同一个GPU上

net[0].weight.data.device

# 输出
device(dype = 'cuda', index = 0)

QA

1、一般使用GPU训练，data在哪一步to gpu比较好

一般是在最后，也就是network之前to gpu最好，因为你的很多数据变化在GPU上面不一定支持的非常好，如果你的数据集变化在GPU上面可以支持的很好的话。可以往前走，尽量在GPU上面做运算（比如做图片的话，可以在GPU上面做一些预处理），但是会占GPU资源，尽量要把资源留给前向运算和回传运算。

2、tensor.cuda和to(device)的区别

to(device)是Module的东西，Module只能用to(device)