Pytorch单机多卡训练

Jeffrey-zh

已于 2023-10-22 22:52:25 修改

阅读量140

点赞数

分类专栏：深度学习文章标签： pytorch 人工智能 python

于 2023-08-18 22:38:43 首次发布

本文链接：https://blog.csdn.net/Jeffrey_0711/article/details/132368984

版权

3 篇文章 0 订阅

订阅专栏

torch.distributed.init_process_group(“nccl”, world_size=n_gpus, rank=args.local_rank):初始化一个进程组,"nccl"是通信协议，world_size分布式进程数量，rank当前进程的id
**torch.cuda.set_device(args.local_rank)**相当于设定可见GPU环境变量
model = DistributedDataParallel(model.cuda(args.local_rank), device_ids=[args.local_rank]):
train_sampler = DistributedSampler(train_dataset):把数据随机分配到不同的GPU上;在每个周期开始处，调用**train_sampler.set_epoch(epoch)**可以使得每张卡在每个周期得到的数据是随机的,训练效果更好
train_dataloader = Distributed(sampler=train_sampler):sampler传入之后就不再需要传入shuffle参数了，两者是互斥的
data = data.cuda(args.local_rank):加载数据

python -m torch.distributed.launch --nproc_per_node=n_gpus trian.py
传入需要用多少个GPU训练（n_gpus）

torch.save注意模型保存需要用if语句仅在local_rank=0的位置进行保存，注意模型参数传入用：**model.module.state_dict()**使用
checkpoint = torch.load(resume, map_location=torch.device(“cpu”))
可以是cpu,cuda,cuda:index，取决于模型需要加载在哪个设备上

查看:
- torch.cuda.device_count()：返回可用数量
- torch.cuda.is_available():返回是否可用
设置：
- 从python环境变量角度设置，os.env
- 在cmd运行时最前面加上,CUDA_VISIBLE_DEVICES=“”

x=x.cuda():返回一个新设备上的数据类型，需要赋值操作

a = b + c, d = e + a,
只会用到d但是中间很多变量就多出来了，可以替换为:b = b + c, b = b + e

loss = self.criteration(output, label)
loss_sum += loss
####更改为
loss = self.criteration(output, label)
loss_sum += loss.item()

这里loss_sum实际只需要看它的值，不需要加到计算图中。loss_sum应当保持为标量（单个数值），而不是一个张量
详细教程可参考

关注

专栏目录