核函数中有4个参数,分别为grid维度,block维度,每个block在共享内存中动态分配的字节数量,以及cuda stream。
kernel<<<dim_grid, dim_block, num_bytes_in_SharedMem, stream>>>
以下内容参考自cuda8.0 cuda c programming guide.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
在章节B.21. Execution Configuration 中有提及
The execution configuration is specified by inserting an expression of the form <<<Dg, Db, Ns, S >>> between the function name and the parenthesized argument list,where:
-
‣ Dg is of type dim3 (see dim3) and specifies the dimension and size of the grid, suchthat Dg.x * Dg.y * Dg.z equals the number of blocks being launched;
-
‣ Db is of type dim3 (see dim3) and specifies the dimension and size of each block,such that Db.x * Db.y * Db.z equals the number of threads per block;
-
‣ Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in __shared__; Ns is an optional argument which defaults to 0;
-
‣ S is of type cudaStream_t and specifies the associated stream; S is an optionalargument which defaults to 0.