Python Multiprocessing with PyCUDA
参考:https://stackoverflow.com/questions/5904872/python-multiprocessing-with-pycuda
You need to get all your bananas lined up on the CUDA side of things first, then think about the best way to get this done in Python [shameless rep whoring, I know].
The CUDA multi-GPU model is pretty straightforward pre 4.0 - each GPU has its own context, and each context must be established by a different host thread. So the idea in pseudocode is:
- Application starts, process uses the API to determine the number of usable GPUS (beware things like compute mode in Linux)
- Application launches a new host thread per GPU, passing a GPU id. Each thread implicitly/explicitly calls equivalent of cuCtxCreate() passing the GPU id it has been assigned
- Profit!
In Python, this might look something like this:
import threading
from pycuda import driver
class gpuThread(threading.Thread):
def __init__(self, gpuid):
threading.Thread.__init__(self)
self.ctx = driver.Device(gpuid).make_context()
self.device = self.ctx.get_device()
def run(self):
print "%s has device %s, api version %s" \
% (self.getName(), self.device.name(), self.ctx.get_api_version())
# Profit!
def join(self):
self.ctx.detach()
threading.Thread.join(self)
driver.init()
ngpus = driver.Device.count()
for i in range(ngpus):
t = gpuThread(i)
t.start()
t.join()
This assumes it is safe to just establish a context without any checking of the device beforehand. Ideally you would check the compute mode to make sure it is safe to try, then use an exception handler in case a device is busy. But hopefully this gives the basic idea.