__device__
1)Resides in global memory space,全局共享grid
2)Has the lifetime of the CUDA context in which it is created,整个程序为声明周期
3)Has a distinct object per device,每个device唯一
4)Is accessible from all the threads within the grid and from the host through the runtime library (cudaGetSymbolAddress() / cudaGetSymbolSize() / cudaMemcpyToSymbol() / cudaMemcpyFromSymbol()).
__constant__
相当于全局常量,不会改变
1)Resides in constant memory space,
2)Has the lifetime of the CUDA context in which it is created,
3)Has a distinct object per device,
4)Is accessible from all the threads within the grid and from the host through the runtime library (cudaGetSymbolAddress() / cudaGetSymbolSize() / cudaMemcpyToSymbol() / cudaMemcpyFromSymbol()).
__shared__
1)Resides in the shared memory space of a thread block,
2)Has the lifetime of the block,
3)Has a distinct object per block,
4)Is only accessible from all the threads within the block,
5)Does not have a constant address.
参考:https://www.cnblogs.com/jugg1024/p/4354672.html
__managed__
1)Can be referenced from both device and host code
2)Has the lifetime of an application.
__restrict__
参考:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#variable-memory-space-specifiers