cannot re-initialize CUDA in forked subproess

该问题描述了pytorch 中使用cuda 初始化时, 与 DataLoader 中使用num_worker 多进程, 这两者之间发生了冲突,

根据错误提示, 在DataLoader 中使用 multiprocessing_context 该参数,并设置为spawn, 由于平时创建时,使用fork 创建的是子线程, 所以没有注意。

1. cannot re-initialize CUDA in forked subproess 解决方法

import torch.multiprocessing as mp

def get_mean_and_std_4channel(dataset):
    '''Compute the mean and std value of dataset.'''
    mp.set_start_method('spawn') # set multiprocessing context to 'spawn'
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2, multiprocessing_context='spawn')
    mean = torch.zeros(4)
    std = torch.zeros(4)
    print('==> Computing the 9 channel  mean and std..')
    for inputs, targets in dataloader:
        for i in range(4):
            mean[i] += inputs[:, i, :, :].mean()
            std[i] += inputs[:, i, :, :].std()
    mean.div_(len(dataset))
    std.div_(len(dataset))
    return mean, std

1.1  原因解释

The error message you are seeing indicates that there is a conflict between PyTorch’s CUDA initialization and the use of multiprocessing in the DataLoader. One way to resolve this issue is to set the multiprocessing_context argument of the DataLoader to ‘spawn’ instead of the default ‘fork’ context. This will create a new process for the dataloader that does not conflict with the CUDA initialization.

In this modified code, we first import the torch.multiprocessing module and set the multiprocessing context to ‘spawn’ using the set_start_method function. We then pass the ‘spawn’ context to the DataLoader as the multiprocessing_context argument. This should resolve the CUDA initialization conflict and allow the DataLoader to run without errors.

多进程上下文: 是指多进程运行时的环境。
在python 中可以使用 spawn, fork, forkserver, 不同的方式会决定如何创建子进程, 并且决定如何管理共享内存。

2 runtimeError context has already been set

若果在更改之后出现该问题,表明在代码的另外地方, 另外的进程已经设置了多进程的环境;
再次设置多进程时,便会出现错误;

2.1 解决方式1

在创建任何子进程之前,  确保多进程环境只被设置在主进程中;
可以通过 multiprocessing.current_process().name 来检查当前的进程是否是主进程;

import torch.multiprocessing as mp

if __name__ == '__main__':
    if mp.current_process().name == 'MainProcess':
        mp.set_start_method('spawn')
    # rest of your code here

2.2 解决方式2

若是在第三方的库中,设置了多进程环境;
需要使用 已经存在的多进程上下文, 而不是在创建一个。
使用 get_context() 方法 赋值给 multiprocessing_context参数

import torch.multiprocessing as mp

def get_mean_and_std_4channel(dataset):
    '''Compute the mean and std value of dataset.'''
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2, multiprocessing_context=mp.get_context())
    mean = torch.zeros(4)
    std = torch.zeros(4)
    print('==> Computing the 9 channel  mean and std..')
    for inputs, targets in dataloader:
        for i in range(4):
            mean[i] += inputs[:, i, :, :].mean()
            std[i] += inputs[:, i, :, :].std()
    mean.div_(len(dataset))
    std.div_(len(dataset))
    return mean, std

2.3 解决方式3

num_workers = 0;

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
deviceQuery.exe Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 650" CUDA Driver Version / Runtime Version 9.1 / 8.0 CUDA Capability Major/Minor version number: 3.0 Total amount of global memory: 2048 MBytes (2147483648 bytes) ( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores GPU Max Clock rate: 1072 MHz (1.07 GHz) Memory Clock rate: 2500 Mhz Memory Bus Width: 128-bit L2 Cache Size: 262144 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 650 Result = PASS

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值