pytorch多卡训练，代码写法

最新推荐文章于 2024-09-20 10:39:04 发布

贝猫说python

最新推荐文章于 2024-09-20 10:39:04 发布

阅读量921

点赞数

原文链接：https://blog.csdn.net/weixin_40087578/article/details/87186613

版权

import os
os.environ[‘CUDA_VISIBLE_DEVICES’] = “2,3” #变成 pytorch 中01
use_cuda = torch.cuda.is_available() #true

GPU，长度 len(range(torch.cuda.device_count())

这是pytorch 官方的原理图按照这个官方的原理图修改应该参照

https://blog.csdn.net/qq_19598705/article/details/80396325

上文也用dataParallel 包装了optimizer, 对照官方原理图中第二行第二个，将梯度分发出去，将每个模型上的梯度更新（第二行第三个），然后再将更新完梯度的模型参数合并到主gpu(第二行最后一个步骤)

其实完全没必要，因为每次前向传播的时候都会分发模型，用不着反向传播时将梯度loss分发到各个GPU，单独计算梯度，再合并模型。可以就在主GPU 上根据总loss 更新模型的梯度，不用再同步其他GPU上的模型，因为前向传播的时候会分发模型。

所以上述链接里不用 dataParallel 包装 optimizer。

DataParallel并行计算只存在在前向传播

import os
import torch
args.gpu_id="2,7" ; #指定gpu id
args.cuda = not args.no_cuda and torch.cuda.is_available() #作为是否使用cpu的判定
#配置环境  也可以在运行时临时指定 CUDA_VISIBLE_DEVICES='2,7' Python train.py
os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu_id #这里的赋值必须是字符串，list会报错
device_ids=range(torch.cuda.device_count())  #torch.cuda.device_count()=2
#device_ids=[0,1] 这里的0 就是上述指定 2，是主gpu,  1就是7,模型和数据由主gpu分发
 
if arg.cuda:
    model=model.cuda()  #这里将模型复制到gpu ,默认是cuda('0')，即转到第一个GPU 2
if len(device_id)>1:
	self.model = torch.nn.DataParallel(self.model).cuda()
 
#前向传播时数据也要cuda(),即复制到主gpu里
for batch_idx, (data, label) in pbar:   
    if args.cuda:
        data,label= data.cuda(),label.cuda();
    data_v = Variable(data)
    target_var = Variable(label)
    prediction= model(data_v,target_var,args)
    #这里的prediction 预测结果是由两个gpu合并过的，并行计算只存在在前向传播里
    #前向传播每个gpu计算量为 batch_size/len(device_ids),等前向传播完了将结果和到主gpu里
    #prediction length=batch_size
 
    criterion = nn.CrossEntropyLoss()
    loss = criterion(prediction,target_var) #计算loss
    optimizer.zero_grad()
    loss.backward()  
    optimizer.step()