概念
1、同步:主机向设备提交任务(如kernel),在同步的情况下,主机将会阻塞,知道设备将所提交任务完成,并将控制权交回主机,然后会继续执行主机的程序;
2、异步:主机向设备提交任务后,设备开始执行任务,并立刻将控制权交回主机,所以主机将不会阻塞,而是直接继续执行主机的程序,即在异步的情况下,主机不会等待设备执行任务完成;
When to Call cudaDeviceSynchronize()
Why do we need cudaDeviceSynchronize
in kernels or host code?
Although CUDA kernel launches are asynchronous(异步), all GPU-related tasks placed in one stream(which is default behaviour) are excuted sequentially.
So for exmaple,
kernel1<<<X,Y>>>(...); //kernel start execution, CPU continues to next statement
kernel2<<<X,Y>>>(...); //kernel is placed in queue and will start after kernel finishes, CPU continues to next statement
cudaMemcpy(...); //CPU blocks until ememory is copied, memory