CUDA C Programming Guide | Programming Model

Kernels 核函数

  • A kernel is defined using the global declaration specifier 【声明说明符】
  • the number of CUDA threads that execute that kernel for a given kernel call is specified using a new <<<…>>>execution configuration syntax【执行配置语法】
  • Each thread that executes the kernel is given a unique thread ID that is accessible within the kernel through the built-in threadIdx variable.

Thread Hierarchy 线程层次结构

  • threadIdx 是一个 三分量向量.可以使用一维,二维或三维线程索引来识别线程,形成一维,二维或三维线程块,称为线程块.
    在这里插入图片描述
  • 线程块 Thread blocks are required to execute independently【独立地】
  • 块内的线程:
    Threads within a block can cooperate by sharing data through some shared memory and by synchronizing their execution to coordinate memory accesses. More precisely【精确地】, one can specify synchronization points in the kernel by calling the __syncthreads() intrinsic function【内部函数】; __syncthreads() acts as a barrier at which all threads in the block must wait before any is allowed to proceed. Shared Memory gives an example of using shared memory. In addition to __syncthreads(), the Cooperative Groups API provides a rich set of thread-synchronization【线程同步】 primitives.
    为了更效率第操作,共享内存 low-latency memory(更像是L1 cache).

Memory Hierarchy【内存层次结构】

在这里插入图片描述

  • Each thread has private local memory. 每一个线程有独立的局部内存
  • Each thread block has shared memory visible to all threads of the block and with the same lifetime as the block.块内的共享内容仅对同一线程块的线程可见.
  • All threads have access to the same global memory.所有的线程都能访问全局内存.
  • There are also two additional read-only memory spaces accessible by all threads: the constant and texture memory spaces.【只读内存】:constant【常量】和texture【纹理】。
  • 全局内存,常量内存,纹理内存都是连续的。
  • 常量内存和纹理内存时经过优化的,有不同用途。纹理内存也提供不同的addressing modes,比如data filtering for some specific data formats。

Heterogeneous Programming【异构编程】

在这里插入图片描述

  • The CUDA programming model also assumes that both the host and the device maintain their own separate memory spaces in DRAM, referred to as host memory and device memory, respectively.
  • CUDA runtime:a program manages the global, constant, and texture memory spaces visible to kernels through calls to the CUDA runtime .这包括数据内存的分配与销毁也包括数据在host和deviece之间的传输。
  • Unified Memory 提供托管内存以连通主机和设备内存空间。 可以从系统中的所有CPU和GPU访问托管内存,作为具有公共地址空间的单个连贯内存映像。 此功能可实现设备内存的超额预订,并且无需在主机和设备上显式镜像数据,从而大大简化了移植应用程序的任务。

2.5 Compute Capability【计算能力】

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值