CUDA C编程手册: 编程接口（五）

最新推荐文章于 2024-07-05 20:32:07 发布

xholes

最新推荐文章于 2024-07-05 20:32:07 发布

阅读量596

点赞数

本文链接：https://blog.csdn.net/xholes/article/details/106621562

版权

CUDA 专栏收录该内容

9 篇文章 3 订阅

订阅专栏

CUDA C编程手册: 编程接口（五）

CUDA C 运行时

CUDA C 运行时

纹理内存

CUDA支持使用一小部分GPU用于图形显示的纹理硬件来对纹理内存和表面内存进行访问。相比于从全局内存，从纹理或者表面内存中读取数据的在之前的设备内存章节已经进行了介绍。

有两类API 用来完成对纹理和表面内存进行使用：纹理引用API在所有的设备都能使用；纹理对象API只支持在计算力大于3.0的设备上使用。纹理引用API有一些纹理对象API所有没有的限制，使用时需要有所注意。

纹理内存

核可以使用设备端的函数来读纹理内存。调用一个这样的函数来读取纹理的过程称之为texture fetch。每次fetch都会指明一个参数，texture object 或者 texture reference给相对应的API。

纹理对象或者纹理引用：

-纹理texture，是指fetch的一小片纹理内存。纹理对象是在运行时创建的且同时对纹理进行明确。纹理引用是在编译时创建并在运行时明确。一些间隔较远的纹理引用可能会界定相同的纹理或者有内存重叠的纹理。纹理可以是一段线性的内存区域或者是一个CUDA数组。

它的dimensionality明确了纹理是否指向了一个一维、二维或者三维的数组。数组中的元素称为纹理元(texels: texture elements)。纹理的宽度width、高度height和深度depth分别代表了数组各个维度的尺寸大小。不同计算力的设备所支持的纹理尺寸大小是不一样的。
纹理元的类型受限于基础的整型、单精度浮点型、1/2/4元向量。
读模式read mode 等价于cudaReadModeNormaledFloat或者cudaReadModeElementType。对于前者，如果纹理元的类型是8位或者16位的整型，纹理fetch返回的是一个浮点数。这个浮点数来自于将该类型取值映射值一个范围（无符号整型映射至[0.0, 1.0], 有符号整型映射至[-1.0, 1.0]），然后该实际值在新范围内的取值作为返回值。例如对于8位的值为0xff的纹理元，其返回值为1。然而，对于后者并不会做类型的转换。
纹理坐标是否归一化。默认地，纹理的引用用的是范围为[0， N-1]的浮点坐标，其中N是为纹理在某个维度尺寸在对应的坐标上的值。例如，大小为64x32的纹理，对于X和y两个维度，坐标的范围分别为[0, 63] 和 [0, 31]。归一化的坐标范围将使用[0, 1.0-1/N] 代替 [0, N-1]。归一化的坐标范围在一些应用的需求下更加合适，因为这样的纹理坐标独立于纹理的尺寸大小。
寻址模式。当调用一些函数接口使用超出坐标范围的值时也是可行的，因为特殊的寻址模式定义了这种行为。默认的寻址模式是是坐标值在其取值范围之内；边界模式border mode对于超出范围的寻址返回0；wrap mode下每个坐标 $x$ 都会被转换为 $f r a c ( x ) = x f l o o r ( x ) frac(x) = x floor(x)$ ; mirro mode下，当 $f l o o r (x)$ 是偶数的时候 $x$ 取 $f r a c ( x ) frac(x)$ ; 当 $f l o o r (x)$ 是奇数的时候 $x$ 取 $1 − f r a c ( x ) 1-frac(x)$ .。寻址模式的指定是通过一个三元数组来实现三个维度的不同寻址。寻址模式有cudaAddressModeBorder、cudaAddressModeClamp、cudaAddressModeWrap和cudaAddressModeMirror。后两种模式只支持归一化的纹理坐标。
filtering模式指明了在fetching纹理的时候对给定了坐标的返回值如何计算。当纹理被配置成返回浮点类型的时候会执行线性纹理filtering，它使用的在邻域内低精度的插值方法。但这个被激活时，所fetch的位置周围的纹理元将会被读取然后再使用它们来对所fetch位置进行插值处理，如线性插值、二次线性插值、三次线性插值等。filtering模式等价于cudaFilterModePoint或者<cudaFilterModeLinear/kbd>。前者使用的是最近邻插值方式，后者使用的线性插值方式且仅在返回值是浮点型的时候才可以使用。

纹理对象API

纹理对象的创建可以利用cudaCreateTextureObject来实现，这个接口在调用的时候需要使用一个称为cudaResourceDesc的结构体，它指明了纹理的一些属性。

struct cudaTextureDesc
{
	enum cudaTextureAddressMode addressMode[3]; // 寻址模式
	enum cudaTextureFilterMode filterMode; // filtering 模式
	enum cudaTextureReadMode readMode; // 读取模式
	int sRGB;
	int normalizedCoords; // 是否坐标归一化
	unsigned int maxAnisotropy; 
	enum cudaTextureFilterMode mipmapFilterMode; 
	float mipmapLevelBias;
	float minMipmapLevelClamp;
	float maxMipmapLevelClamp;
};

下示代码是纹理对象的应用场景之一：

// Simple transformation kernel
__global__ void transformKernel(float* output,
cudaTextureObject_t texObj,
int width, int height,
float theta)
{
	// Calculate normalized texture coordinates
	unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
	unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
	float u = x / (float)width;
	float v = y / (float)height;
	// Transform coordinates
	u -= 0.5f;
	v -= 0.5f;
	float tu = u * cosf(theta) - v * sinf(theta) + 0.5f;
	float tv = v * cosf(theta) + u * sinf(theta) + 0.5f;
	// Read from texture and write to global memory
	output[y * width + x] = tex2D<float>(texObj, tu, tv);
}

// Host code
int main()
{
	// Allocate CUDA array in device memory
	cudaChannelFormatDesc channelDesc =
	cudaCreateChannelDesc(32, 0, 0, 0,
	cudaChannelFormatKindFloat);
	cudaArray* cuArray;
	cudaMallocArray(&cuArray, &channelDesc, width, height);
	// Copy to device memory some data located at address h_data
	// in host memory
	cudaMemcpyToArray(cuArray, 0, 0, h_data, size,
	cudaMemcpyHostToDevice);
	// Specify texture
	struct cudaResourceDesc resDesc;
	memset(&resDesc, 0, sizeof(resDesc));
	resDesc.resType = cudaResourceTypeArray;
	resDesc.res.array.array = cuArray;
	// Specify texture object parameters
	struct cudaTextureDesc texDesc;
	memset(&texDesc, 0, sizeof(texDesc));
	texDesc.addressMode[0] = cudaAddressModeWrap;
	texDesc.addressMode[1] = cudaAddressModeWrap;
	texDesc.filterMode = cudaFilterModeLinear;
	texDesc.readMode = cudaReadModeElementType;
	texDesc.normalizedCoords = 1;
	// Create texture object
	cudaTextureObject_t texObj = 0;
	cudaCreateTextureObject(&texObj, &resDesc, &texDesc, NULL);
	// Allocate result of transformation in device memory
	float* output;
	cudaMalloc(&output, width * height * sizeof(float));
	// Invoke kernel
	dim3 dimBlock(16, 16);
	dim3 dimGrid((width + dimBlock.x - 1) / dimBlock.x,
	(height + dimBlock.y - 1) / dimBlock.y);
	transformKernel<<<dimGrid, dimBlock>>>(output,
	texObj, width, height,
	angle);
	// Destroy texture object
	cudaDestroyTextureObject(texObj);
	// Free device memory
	cudaFreeArray(cuArray);
	cudaFree(output);
	return 0;
}

纹理引用API

纹理引用的一些属性是不可更改的，因此必须在编译的时候就确定。它们将指明何时声明纹理引用。可以使用下列方式声明文件范围内(file scope)的纹理型变量：

texture<DataType, Type, ReadMode> texRef;

-DataType指明纹理元的类型
-Type指明纹理引用的类型：cudaTexture1D、cudaTextureType2D、cudaTextureType3D分别表示一维、二维和三维纹理；cudaTextureType1DLayered、cudaTextureType1DLayered分别表示一维和二维的layered纹理。这个参数的默认选项是cudaTexture1D。
-ReadMode指明了读取模式，默认值为cudaReadModeElementType。

注意纹理引用只能被生命为静态的全局变量且并不能够作为参数传递给函数使用。

纹理引用的其他变量时可变的，可以通过通过主机的运行时在执行的时候进行修改。如参考手册中所描述，运行时API有低级别的C风格接口和高级别的C++风格接口。纹理类型texture定义的一个高级别API结构体，它派生与低级别API 中的纹理引用类型textureReference。

struct textureReference {
	int normalized;  // 纹理坐标是否归一化
	enum cudaTextureFilterMode filterMode;  //filtering 模式
	enum cudaTextureAddressMode addressMode[3]; //寻址模式
	struct cudaChannelFormatDesc channelDesc; //纹理元的数据类型，必须与声明的相同
	int sRGB;
	unsigned int maxAnisotropy;
	enum cudaTextureFilterMode mipmapFilterMode;
	float mipmapLevelBias;
	float minMipmapLevelClamp;
	float maxMipmapLevelClamp;
}
struct cudaChannelFormatDesc {
	int x, y, z, w; // 返回值的每个分量的bit数
	enum cudaChannelFormatKind f; // cudaChannelFormatKindSigned、cudaChannelFormatKindUnsigned、cudaChannelFormatKindFloat
};

在核函数使用纹理引用来读取纹理内存之前，需要使用相应的接口来将纹理和纹理引用进行绑定。对于线性内存使用cudaBindTexture()或cudaBindTexture2D(), 对于CUDA 数组使用cudaBindTextureToArray()。cudaUnbindTexture()用于解绑一个纹理引用。当一个纹理引用解绑之后才能重新绑定其他数组，即使之前的绑定在核函数中并没有使用完毕也可以。

下列分别使用低级API 和高级API来展示如何进行绑定。

// 低级API

// example1
texture<float, cudaTextureType2D,
cudaReadModeElementType> texRef;
textureReference* texRefPtr;
cudaGetTextureReference(&texRefPtr, &texRef);
cudaChannelFormatDesc channelDesc =
cudaCreateChannelDesc<float>();
size_t offset;
cudaBindTexture2D(&offset, texRefPtr, devPtr, &channelDesc,
width, height, pitch);

// example2
texture<float, cudaTextureType2D,
cudaReadModeElementType> texRef;
textureReference* texRefPtr;
cudaGetTextureReference(&texRefPtr, &texRef);
cudaChannelFormatDesc channelDesc;
cudaGetChannelDesc(&channelDesc, cuArray);
cudaBindTextureToArray(texRef, cuArray, &channelDesc);


//高级API
// example1
texture<float, cudaTextureType2D,
cudaReadModeElementType> texRef;
cudaChannelFormatDesc channelDesc =
cudaCreateChannelDesc<float>();
size_t offset;
cudaBindTexture2D(&offset, texRef, devPtr, channelDesc,
width, height, pitch);

// example2
texture<float, cudaTextureType2D,
cudaReadModeElementType> texRef;
cudaBindTextureToArray(texRef, cuArray);

注意，在将一个纹理绑定至纹理引用的时候其指定的格式必须与声明纹理引用的指定是相匹配的；否则纹理fetch的结果是未定义的。同时核函数所能同时绑定的纹理个数是有限制的。

下示例代码中对纹理使用了一些简单的变化操作。

// 2D float texture
texture<float, cudaTextureType2D, cudaReadModeElementType> texRef;
// Simple transformation kernel
__global__ void transformKernel(float* output,
int width, int height,
float theta)
{
	// Calculate normalized texture coordinates
	unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
	unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
	float u = x / (float)width;
	float v = y / (float)height;
	// Transform coordinates
	u -= 0.5f;
	v -= 0.5f;
	float tu = u * cosf(theta) - v * sinf(theta) + 0.5f;
	float tv = v * cosf(theta) + u * sinf(theta) + 0.5f;
	// Read from texture and write to global memory
	output[y * width + x] = tex2D(texRef, tu, tv);
}
// Host code
int main()
{
	// Allocate CUDA array in device memory
	cudaChannelFormatDesc channelDesc =
	cudaCreateChannelDesc(32, 0, 0, 0,
	cudaChannelFormatKindFloat);
	cudaArray* cuArray;
	cudaMallocArray(&cuArray, &channelDesc, width, height);
	
	// Copy to device memory some data located at address h_data
	// in host memory
	cudaMemcpyToArray(cuArray, 0, 0, h_data, size,
	cudaMemcpyHostToDevice);
	
	// Set texture reference parameters
	texRef.addressMode[0] = cudaAddressModeWrap;
	texRef.addressMode[1] = cudaAddressModeWrap;
	texRef.filterMode = cudaFilterModeLinear;
	texRef.normalized = true;
	
	// Bind the array to the texture reference
	cudaBindTextureToArray(texRef, cuArray, channelDesc);

	// Allocate result of transformation in device memory
	float* output;
	cudaMalloc(&output, width * height * sizeof(float));
	// Invoke kernel
	dim3 dimBlock(16, 16);
	dim3 dimGrid((width + dimBlock.x - 1) / dimBlock.x,
	(height + dimBlock.y - 1) / dimBlock.y);
	transformKernel<<<dimGrid, dimBlock>>>(output, width, height,
	angle);
	// Free device memory
	cudaFreeArray(cuArray);
	cudaFree(output);
	return 0;
}

16位浮点纹理

CUDA 数组支持16位浮点数，即半精度。CUDA C 并不提供相匹配的数据类型，但是提供了内部函数来利用无符号短整型unsigned short进行从/到32位浮点数的转换：__float2half_rn(float)、__half2float(unsigned short)。这些函数只能在设备端函数内使用。相应的主机端等价函数可以在OpenEXR库中找到。

在纹理fetching的时候，执行filtering之前会将16位浮点分量提升至32位浮点类型。16位浮点形式的通道描述符可以通过调用cudaCreateChannelDescHalf*()来进行创建。

Layered 纹理

一维或二维的layered纹理（在Direct3D中陈伟纹理数组texture array, 在OpenGL中称为数组纹理array texutre）是有连续的层构成，每层都是具有相同维度、大小和数据类型的矩形纹理构成。

一维的layered纹理的寻址是通过一个整数索引和一个浮点纹理坐标，索引表示层的位置而坐标表示在该层中纹理元的位置；二维的layered纹理的寻址是通过一个整数索引和两个浮点坐标来进行的，索引表示层的位置，而两个浮点数坐标表示纹理元在该层中的位置。

layered 纹理只能通过在调用cudaMalloc3DArray()的时候传递标志cudaArrayLayered创建。layered纹理的fetching需要通过设备端函数tex1Dlayered() 和tex2dLayered()来进行。纹理的fetching只能在层中进行，不会发生跨层。

layered纹理只在计算力高于2.0的设备上才能后使用。

Cubemap 纹理

cubemap纹理是一种特殊的二维layered纹理。它由6层组成，代表了一个cube的各个面。

层的宽度与高度相同
使用三个纹理坐标 $x, y, z$ 来进行寻址，可以解释为一个方向向量，起点为cube的中心，指向纹理元所有的对应面的位置。

cubemap纹理只能通过在调用cudaMalloc3DArray()的时候传递标志cudaArrayCubemap创建。cubemap纹理的fetching需要通过设备端函数texCubemap() 和texCubemap()来进行。

cubemap纹理只在计算力高于2.0的设备上才能后使用。

纹理聚合(gather)

纹理聚合使用特殊的纹理fetch，它只适用于二维的纹理。通过调用tex2Dgather()来实现，它和tex2D()具有相同的参数，再加上一个额外的值域为{0， 1， 2, 3}的comp参数。它返回四个32位的数字，对应于四个纹理元的分量值，用于在常规纹理fetch时进行双线性filtering。例如，如果纹理元的值为(253, 20, 31, 255), (250, 25, 29, 254), (249, 16, 37, 253), (251, 22, 30, 250), comp的值是2，那么tex2Dgather返回的值是(31, 29, 37, 30)。

注意，纹理坐标的计算只有8位的小数精度。tex2Dgather()因此可能会返回与tex2D()不同的结果。

xholes

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
CUDA C编程手册: 编程接口（五）

CUDA C编程手册: 编程接口（五）CUDA C 运行时纹理内存和表面内存纹理内存CUDA C 运行时纹理内存和表面内存CUDA支持使用一小部分GPU用于图形显示的纹理硬件来对纹理内存和表面内存进行访问。相比于从全局内存，从纹理或者表面内存中读取数据的在之前的设备内存章节已经进行了介绍。有两类API 用来完成对纹理和表面内存进行使用：纹理引用API在所有的设备都能使用；纹理对象API只支持在计算力大于3.0的设备上使用。纹理引用API有一些纹理对象API所有没有的限制，使用时需要有所注意。
复制链接

扫一扫

专栏目录