OpenCL™规范 3.3.2内存对象

3.3.2. Memory Objects
3.3.2内存对象

The contents of global memory are memory objects. A memory object is a handle to a reference counted region of global memory. Memory objects use the OpenCL type cl_mem and fall into three distinct classes.

全局内存的内容是内存对象。内存对象是全局内存中引用计数区域的句柄。内存对象使用OpenCL类型cl_mem,分为三个不同的类。

  • Buffer: A memory object stored as a block of contiguous memory and used as a general purpose object to hold data used in an OpenCL program. The types of the values within a buffer may be any of the built in types (such as int, float), vector types, or user-defined structures. The buffer can be manipulated through pointers much as one would with any block of memory in C.

  • 缓冲区:一个存储对象,作为一个连续的内存块存储,并用作通用对象来保存OpenCL程序中使用的数据。缓冲区内的值的类型可以是任何内置类型(如int、float)、向量类型或用户定义的结构。缓冲区可以通过指针进行操作,就像使用C中的任何内存块一样。

  • Image: An image memory object holds one, two or three dimensional images. The formats are based on the standard image formats used in graphics applications. An image is an opaque data structure managed by functions defined in the OpenCL API. To optimize the manipulation of images stored in the texture memories found in many GPUs, OpenCL kernels have traditionally been disallowed from both reading and writing a single image. In OpenCL 2.0, however, we have relaxed this restriction by providing synchronization and fence operations that let programmers properly synchronize their code to safely allow a kernel to read and write a single image.

  • 图像:图像记忆对象保存一维、二维或三维图像。这些格式基于图形应用程序中使用的标准图像格式。图像是由OpenCL API中定义的函数管理的不透明数据结构。为了优化存储在许多GPU中的纹理存储器中的图像的操作,传统上不允许OpenCL内核读取和写入单个图像。然而,在OpenCL2.0中,我们放宽了这一限制,提供了同步和栅栏操作,让程序员正确地同步他们的代码,以安全地允许内核读取和写入单个图像。

  • Pipe: The pipe memory object conceptually is an ordered sequence of data items. A pipe has two endpoints: a write endpoint into which data items are inserted, and a read endpoint from which data items are removed. At any one time, only one kernel instance may write into a pipe, and only one kernel instance may read from a pipe. To support the producer consumer design pattern, one kernel instance connects to the write endpoint (the producer) while another kernel instance connects to the reading endpoint (the consumer). Note: The pipe memory object is missing before version 2.0.

  • 管道:管道内存对象在概念上是一个有序的数据项序列。管道有两个端点:写入端点和读取端点,写入端点中插入数据项,读取端点中删除数据项。在任何时候,只有一个内核实例可以写入管道,而只有一个核心实例可以从管道读取。为了支持生产者-消费者设计模式,一个内核实例连接到写入端点(生产者),而另一个内核示例连接到读取端点(消费者)。注意:2.0版本之前缺少管道内存对象。

Memory objects are allocated by host APIs. The host program can provide the runtime with a pointer to a block of continuous memory to hold the memory object when the object is created (CL_MEM_USE_HOST_PTR). Alternatively, the physical memory can be managed by the OpenCL runtime and not be directly accessible to the host program.

​内存对象由主机API分配。主机程序可以向运行时提供指向连续存储器块的指针,以在创建存储器对象时保持该存储器对象(CL_MEM_USE_HOST_PTR)。或者,物理存储器可以由OpenCL运行时管理,而主机程序不能直接访问。

Allocation and access to memory objects within the different memory regions varies between the host and work-items running on a device. This is summarized in the Memory Regions table, which describes whether the kernel or the host can allocate from a memory region, the type of allocation (static at compile time vs. dynamic at runtime) and the type of access allowed (i.e. whether the kernel or the host can read and/or write to a memory region).

​对不同内存区域内内存对象的分配和访问因主机和设备上运行的工作项而异。这在内存区域表中进行了总结,该表描述了内核或主机是否可以从内存区域进行分配、分配类型(编译时静态与运行时动态)和允许的访问类型(即内核或主机能否读取和/或写入内存区域)。

Table 1. Memory Regions

表1 内存区域

Global

全局变量

Constant

常量

Local

本地变量

Private

专有变量

Host

主机

Allocation

分配

Dynamic

动态

Dynamic

动态

Dynamic

动态

None

Access

访问

Read/Write to Buffers and Images, but not Pipes

读取/写入缓冲区和图像,但不写入管道

Read/Write

读/写

None

None

Kernel

内核

Allocation

分配

Static (program scope variables)

静态(程序范围变量)

Static (program scope variables)

静态(程序范围变量)

Static for parent kernel, Dynamic for child kernels

父内核为静态,子内核为动态

Static

静态

Access

访问

Read/Write

读/写

Read-only

只读

Read/Write, No access to child kernel memory

读/写,无法访问子内核内存

Read/Write

读/写

The Memory Regions table shows the different memory regions in OpenCL and how memory objects are allocated and accessed by the host and by an executing instance of a kernel. For kernels, we distinguish between the behavior of local memory for a parent kernel and its child kernels.

​Memory Regions表显示了OpenCL中的不同内存区域,以及主机和内核的执行实例如何分配和访问内存对象。对于内核,我们区分父内核及其子内核的本地内存行为。

Once allocated, a memory object is made available to kernel-instances running on one or more devices. In addition to Shared Virtual Memory, there are three basic ways to manage the contents of buffers between the host and devices.

​一旦分配了内存对象,就会使其可用于在一个或多个设备上运行的内核实例。除了共享虚拟内存,还有三种基本方法可以管理主机和设备之间的缓冲区内容。

  • Read/Write/Fill commands: The data associated with a memory object is explicitly read and written between the host and global memory regions using commands enqueued to an OpenCL command queue. Note: Fill commands are missing before version 1.2.

  • ​读/写/填充命令:使用排队到OpenCL命令队列的命令,在主机和全局内存区域之间显式读取和写入与内存对象相关的数据。注:填充命令在1.2版本之前丢失。

  • Map/Unmap commands: Data from the memory object is mapped into a contiguous block of memory accessed through a host accessible pointer. The host program enqueues a map command on block of a memory object before it can be safely manipulated by the host program. When the host program is finished working with the block of memory, the host program enqueues an unmap command to allow a kernel-instance to safely read and/or write the buffer.

  • 映射/取消映射命令:来自内存对象的数据被映射到通过主机可访问指针访问的连续内存块中。在主机程序可以安全地操作映射命令之前,主机程序将映射命令排队在内存对象的块上。当主机程序完成对内存块的处理时,主机程序将取消映射命令排入队列,以允许内核实例安全地读取和/或写入缓冲区。

  • Copy commands: The data associated with a memory object is copied between two buffers, each of which may reside either on the host or on the device.

  • 复制命令:与内存对象相关的数据在两个缓冲区之间复制,每个缓冲区可能位于主机或设备上。

With Read/Write/Map, the commands can be blocking or non-blocking operations. The OpenCL function call for a blocking memory transfer returns once the command (memory transfer) has completed. At this point the associated memory resources on the host can be safely reused, and following operations on the host are guaranteed that the transfer has already completed. For a non-blocking memory transfer, the OpenCL function call returns as soon as the command is enqueued.

使用读/写/映射,命令可以是阻塞操作或非阻塞操作。一旦命令(内存传输)完成,对阻塞内存传输的OpenCL函数调用就会返回。此时,主机上的相关内存资源可以安全地重复使用,并且主机上的以下操作可以保证传输已经完成。对于非阻塞内存传输,一旦命令入队,OpenCL函数调用就会返回。

Memory objects are bound to a context and hence can appear in multiple kernel-instances running on more than one physical device. The OpenCL platform must support a large range of hardware platforms including systems that do not support a single shared address space in hardware; hence the ways memory objects can be shared between kernel-instances is restricted. The basic principle is that multiple read operations on memory objects from multiple kernel-instances that overlap in time are allowed, but mixing overlapping reads and writes into the same memory objects from different kernel instances is only allowed when fine grained synchronization is used with Shared Virtual Memory.

​内存对象绑定到一个上下文,因此可以出现在多个运行在多个物理设备上的内核实例中。OpenCL平台必须支持大范围的硬件平台,包括不支持硬件中单个共享地址空间的系统;因此,在内核实例之间共享内存对象的方式受到限制。基本原理是,允许对来自多个内核实例的内存对象进行多次读取操作,这些操作在时间上重叠,但只有在与共享虚拟内存使用细粒度同步时,才允许将重叠的读取和写入混合到来自不同内核实例的同一内存对象中。

When global memory is manipulated by multiple kernel-instances running on multiple devices, the OpenCL runtime system must manage the association of memory objects with a given device. In most cases the OpenCL runtime will implicitly associate a memory object with a device. A kernel instance is naturally associated with the command queue to which the kernel was submitted. Since a command-queue can only access a single device, the queue uniquely defines which device is involved with any given kernel-instance; hence defining a clear association between memory objects, kernel-instances and devices. Programmers may anticipate these associations in their programs and explicitly manage association of memory objects with devices in order to improve performance.

当全局内存由运行在多个设备上的多个内核实例操作时,OpenCL运行时系统必须管理内存对象与给定设备的关联。在大多数情况下,OpenCL运行时会隐式地将内存对象与设备相关联。内核实例自然与内核提交到的命令队列相关联。由于命令队列只能访问单个设备,因此该队列唯一地定义了任何给定内核实例涉及的设备;因此定义了内存对象、内核实例和设备之间的清晰关联。程序员可以在他们的程序中预测这些关联,并显式地管理存储器对象与设备的关联,以提高性能。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值