[学习笔记]深度学习:pytorch多GPU训练

零叁三

于 2024-06-22 09:50:43 发布

阅读量235

点赞数 1

文章标签：深度学习学习笔记多GPU

本文链接：https://blog.csdn.net/wahzx/article/details/138007625

版权

一、检查GPU可用性

运行以下代码检查GPU的数量，若数量大于1，则可以启用多GPU训练

import torch

# 打印可用 GPU 的数量
print("多少个GPU啊", torch.cuda.device_count())

# 列出所有可用的 GPU
for i in range(torch.cuda.device_count()):
    print("GPU", i+1, ":", torch.cuda.get_device_name(i))

二、使用 `DataParallel进行多GPU训练`

代码如下

model = ConvNet()  # 这里的 ConvNet 是提前定好的模型类
if torch.cuda.device_count() > 1:
    print("用这么多的GPU： ", torch.cuda.device_count())
    # 包装模型以使用多 GPU
    model = torch.nn.DataParallel(model)

model.to(torch.device("cuda"))  # 将模型移到 GPU

三、`DistributedDataParallel的浅显探索`

对于更高效的多 GPU 训练，可以使用 torch.nn.parallel.DistributedDataParallel，到那时这会比刚刚的有更复杂的设置（多进程运行和特定的数据加载方式）。这种方法通常用于更大规模的训练任务和跨多个节点的训练，这里留个标记，日后需要再做拓展。

DataParallel通常不需要修改数据加载方式，但 DistributedDataParallel则需要使用 torch.utils.data.distributed.DistributedSampler 来控制数据的采样方式。

from torch.utils.data import DataLoader, Dataset, DistributedSampler

# 提前定义好的数据集
dataset = YourDataset()
sampler = DistributedSampler(dataset)
data_loader = DataLoader(dataset, batch_size=batch_size, sampler=sampler)