CUDA编程问题记录:能否用CPU多线程调用CUDA核函数

问题:能否在主机端创建CPU多线程,在每个线程里调用设备端核函数的caller函数,进而实现进一步的并行运行。
例如有5张图片,对于每张图片都有N个GPU线程对其进行像素操作,但是此时是逐一对这5张图片处理的,想在主机端创建5个CPU线程,每个线程里进行 传输到设备端–>设备端GPU多线程处理–>结果返回主机端 这一系列操作,实现五张图片同时处理

此方法能否实现: 不能

只存在一个流时(默认的流),所有调用核函数的指令将被存在一个队列中,依次执行。因此直接使用CPU多线程调用kernel函数不能达到并行的目的,此时即便能运行也与串行运行的效果相同,只有通过使用多流才能进一步加速。

参考资料
https://stackoverflow.com/questions/13061619/what-happen-if-a-cuda-kernel-is-called-from-multiple-pthreads-simultaneously

Question:
I have a CUDA kernel that do my hard work, but I also have some hard work that need to be done in the CPU (calculations with two positions of the same array) that I could not write in CUDA (because CUDA threads are not synchronous, I need to perform a hard work on a position X of an array and after do z[x] = y[x] - y[x - 1], where y is the array resultant of a CUDA kernel where each thread works on one position of this array and z is another array storing the result). So I’m doing this in the CPU.

I have several CPU threads to do the CPU side work, but each one is calling a CUDA kernel passing some data. My question is: what happens on the GPU side when multiple CPU threads are making GPU calls? Would be better if I do the CUDA kernel call once and then create multiple CPU threads to do the CPU side work?

回答:
Kernel calls are queued and executed one by one in single stream.

However you can specify stream during kernel execution - then CUDA operations in different streams may run concurrently and operations from different streams may be interleaved. Default stream is 0.

See:http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

Things are similar when different processes use the same card.
Also remember that kernels are executed asynchronously from CPU stuff.

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值