torch distributed.init out of memory

AI算法网奇

已于 2022-05-13 11:33:44 修改

阅读量1.8k

点赞数

分类专栏： pytorch知识宝典文章标签： pytorch

于 2022-05-13 11:22:57 首次发布

本文链接：https://blog.csdn.net/jacke121/article/details/124748293

版权

pytorch知识宝典专栏收录该内容

496 篇文章 223 订阅 ¥29.90 ¥99.00

订阅专栏

超级会员免费看

torch distributed.init out of memory

设置环境gpu：

os.environ["CUDA_VISIBLE_DEVICES"] = "1, 2, 3"

local_rank=0

torch.cuda.set_device(local_rank)

cuda(0)默认是第0块显卡，

但是设置CUDA_VISIBLE_DEVICES后：

cuda(0)就是CUDA_VISIBLE_DEVICES里面的第一个gpu。

distributed.init 报错out of memory

import argparse
import logging
import os
import time

import torch
import torch.distributed as dist
import torch.nn.functional as F
import torch.utils.data.distributed
def main(args):
    try:
        world_size = int(os.environ['WORLD_SIZE'])
        rank = int(os.environ['RANK'])
        dist_url = "tcp://{}:{}".format(os.environ["MASTER_ADDR"], os.environ["MASTER_PORT"])
    except KeyError:
        world_size = 1
        rank = 0
        dist_url = "tcp://127.0.0.1

了解本专栏

订阅专栏解锁全文

超级会员免费看

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

AI算法网奇

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
torch distributed.init out of memory

torch distributed.init out of memory设置环境gpu：os.environ["CUDA_VISIBLE_DEVICES"] = "1, 2, 3"local_rank=0torch.cuda.set_device(local_rank)cuda(0)默认是第0块显卡，但是设置CUDA_VISIBLE_DEVICES后：cuda(0)就是CUDA_VISIBLE_DEVICES里面的第一个gpu。...
复制链接

扫一扫