pytorch训练时gpu利用率低_Day100:PyTorch使用多GPU训练

最新推荐文章于 2024-03-19 15:37:20 发布

大菲哥艺术留学

最新推荐文章于 2024-03-19 15:37:20 发布

阅读量249

点赞数

文章标签： pytorch训练时gpu利用率低

本文链接：https://blog.csdn.net/weixin_35133280/article/details/113708734

版权

本文介绍了在PyTorch中如何利用cuda指定GPU进行训练，以及在转换为多GPU训练时需要注意的事项。通过添加.module来调用并行网络中的层和参数。同时强调了在多GPU训练中，inputs和labels需要加载到GPU中。

摘要由CSDN通过智能技术生成

利用cuda实现指定GPU训练

import osimport torchargs.gpu_id="2,7" ; #指定gpu idargs.cuda = not args.no_cuda and torch.cuda.is_available() #作为是否使用cpu的判定#配置环境  也可以在运行时临时指定 CUDA_VISIBLE_DEVICES='2,7' Python train.pyos.environ['CUDA_VISIBLE_DEVICES'] = args.gpu_id #这里的赋值必须是字符串，list会报错device_ids=range(torch.cuda.device_count())  #torch.cuda.device_count()=2#device_ids=[0,1] 这里的0 就是上述指定的2号GPU，是主gpu,  1就是7号GPU,模型和数据由主gpu分发 if arg.cuda:    model=model.cuda()  #这里将模型复制到gpu ,默认是cuda('0')，即转到第一个GPU 2if len(device_id)>1:    model=torch.nn.DaraParallel(model);#前提是model已经.cuda() 了 #前向传播时数据也要cuda(),即复制到主gpu里for batch_idx, (data, label) in pbar:       if args.cuda:        data,label= data.cuda(),label.cuda();    data_v = Variable(data)    target_var = Variable(label)    prediction= model(data_v,target_var,args)    #这里的prediction 预测结果是由两个gpu合并过的，并行计算只存在在前向传播里    #前向传播每个gpu计算量为 batch_size/len(device_ids),等前向传播完了将结果和到主gpu里    #prediction length=batch_size     criterion = nn.CrossEntropyLoss()    loss = criterion(prediction,target_var) #计算loss    optimizer.zero_grad()    loss.backward()      optimizer.step()

在实际训练过程中，调用自定义模型model里的函数：

由单GPU转成多GPU时，调用model里继承的函数可以直接调用，例如 model.state_dict() ，model.load_state_dict(torch.load(model_path) 不受影响。
由单GPU转成多GPU时，自己写的函数，要加上.module才行。自己写的函数不可以并行运算，只能在主gpu中运算。DataParallel并行计算仅存在在前向传播

在单GPU中，可以使用以下代码

model = Net()out = model.fc(input)

在DataParallel中，需要修改为如下：

model = Net()model = nn.DataParallel(model)out = model.module.fc(input)

将并行后的网络打印出来，发现需要加上“module”，千万注意是module，而不是model。这样就可以调用并行网络中定义的网络层。这是因为经过DataParallel包装过的模型如下：

和使用单GPU不同的是多了一个.module，多GPU下，进行任何需要调用model里面参数的操作时，都需要在model后面加上一个.module,即model.module，这样才能提取出model里面的参数以及函数等。

利用device实现指定GPU训练

使用多GPU训练，model = nn.DataParallel(model)
注意训练/测试过程中 inputs和labels均需加载到GPU中
示例代码可见参考4的链接

device_ids = [3, 4, 6, 7]model = Module()if torch.cuda.is_available():    module = torch.nn.DataParallel(model, device_ids=device_ids) # 声明所有可用设备    model = model.cuda(device=device_ids[0]) # 模型放在主设备images = images.cuda(device=device_ids[0])   # 训练数据放在主设备labels = labels.cuda(device=device_ids[0])

参考1：https://blog.csdn.net/vino_cherish/article/details/89385855

参考2：https://blog.csdn.net/weixin_40087578/article/details/87186613

参考3：https://blog.csdn.net/daydayjump/article/details/81158777

参考4：https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html?highlight=dataparallel

多GPU训练遇到的坑: https://blog.csdn.net/qq_30614451/article/details/101365759

大菲哥艺术留学

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pytorch训练时gpu利用率低_Day100:PyTorch使用多GPU训练

利用cuda实现指定GPU训练import osimport torchargs.gpu_id="2,7" ; #指定gpu idargs.cuda = not args.no_cuda and torch.cuda.is_available() #作为是否使用cpu的判定#配置环境也可以在运行时临时指定 CUDA_VISIBLE_DEVICES='2,7' Python train.pyos...
复制链接

扫一扫