第8章 图形互操作性
GPU既执行渲染计算,又执行通用计算。。。
CUDA C应用可以无缝地与OpenGL和DirectX这两种实时渲染API进行交互。
本章中包含大量OpenGL和GLUT(OpenGL Utility Toolkit)代码,要有些思想准备。
8.1 本章目标
- 了解图形互操作性是什么,以及为什么需要使用它。
- 了解如何设置某个CUDA设备的图形互操作性。
- 了解如何在CUDA C核函数和OpenGL渲染函数直接共享数据。
8.2 图形互操作
为了说明在图形库与CUDA C之间的互操作机制,我们将编写一个包含两步骤的应用程序。
第一步:是使用CUDA C核函数来生成图像数据。
第二步:应用程序将这个数据传递给OpenGL驱动程序并进行渲染。
要实现这功能,我们需要使用CUDA C以及OpenGL 或者GLUT函数调用。
#define GL_GLEXT_PROTOTYPES
#include "GL/glut.h"
#include "cuda.h"
#include "cuda_gl_interop.h"
#include "../common/book.h"
#include "../common/cpu_bitmap.h"
#define DIM 512
此外,声明两个全局变量来保存句柄,这些句柄指向将要在OpenGL和CUDA C之间共享的数据,他们讲保存指向同一个缓冲区的不同句柄。
定义两个独立编译原因:OpenGL和CUDA对于这个缓冲区各自有着不同的“名字”。
bufferObj: OpenGL对这个数据的命名;
resource:CUDA C对合格变量的命名。
GLuint bufferObj;
cudaGraphicsResource *resource;
许多系统通常只包含一个支持CUDA的GPU,不过CUDA运行时提供了如何选择。
int main( int argc, char **argv ) {
cudaDeviceProp prop;
int dev;
memset( &prop, 0, sizeof( cudaDeviceProp ) );
prop.major = 1;
prop.minor = 0;
HANDLE_ERROR( cudaChooseDevice( &dev, &prop ) );
这段代码告诉运行时选择一个拥有1.0或者更高版本计算功能集的GPU。
原理:1)首先创建一个cudaDeviceProp结构并将其初始化为空;
2)将major版本设置为1,minor版本设置为0;
3)将这个cudaDeviceProp结构传递给cudaChooseDevice() ,这个函数告诉运行时选择系统中的某个满足cudaDeviceProp结构指定条件的GPU。并且在dev中返回这个设备的标识符。
缺点:1)无法确保这个设备是最好的或者是最快的GPU;
2)不能确保不同版本的GPU运行时会选择同一个设备。
我们需要指定CUDA设备的ID,这样才可以告诉CUD运行时应该使用那个设备来执行CUDA和OpenGL,可以通过cudaGLSetGLDevice()来实现这个功能,并把cudaChooseDevice()中获取的设备ID 用dev传进去。
HANDLE_ERROR( cudaGLSetGLDevice( dev ) );
// these GLUT calls need to be made before the other OpenGL
// calls, else we get a seg fault
glutInit( &argc, argv );
glutInitDisplayMode( GLUT_DOUBLE | GLUT_RGBA );
glutInitWindowSize( DIM, DIM );
glutCreateWindow( "bitmap" );
我们通过cudaGLSetGLDevice()为CUDA运行时使用OpenGL驱动程序做好准备;
然后初始化GLUT并创建一个名为“bitmap”的窗口,并将在这个窗口中绘制结果。
共享数据缓冲区是CUDA C核函数和OpenGL渲染操作之间实现互操作的关键部分。要在OpenGL和CUDA之间传递数据,我们首先要创建一个缓冲区在这两组API之间使用。
首先,在OpenGL中创建一个像素缓冲区对象,并将句柄保存在全局变量GLuint bufferObj中:
// the first three are standard OpenGL, the 4th is the CUDA reg
// of the bitmap these calls exist starting in OpenGL 1.5
glGenBuffers( 1, &bufferObj );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, bufferObj );
glBufferData( GL_PIXEL_UNPACK_BUFFER_ARB, DIM * DIM * 4,
NULL, GL_DYNAMIC_DRAW_ARB );
OpenGL中的像素缓冲区对象(Pixel Buffer Object,PBO),可以通过以下三部曲创建
1)通过glGenBuffers()
生成一个缓冲区句柄;
2)通过glBindBuffer()将句柄
绑定到像素缓冲区;
3)通过glBufferData()请求OpenGL驱动程序来
分配一个缓冲区。
GL_PIXEL_UNPACK_BUFFER_ARB: 枚举类型,表示这个缓冲区将被应用程序反复修改
分配的缓冲区来保存
DIM * DIM 个32位的值
剩下的工作就是通知CUDA运行时,缓冲区bufferObj将在CUDA与OpenGL之间共享。需要将bufferObj注册为一个图形资源(Graphics Resource)。
HANDLE_ERROR(
cudaGraphicsGLRegisterBuffer( &resource,
bufferObj,
cudaGraphicsMapFlagsNone ) );
cudaGraphicsGLRegisterBuffer()告诉CUDA运行时希望在OpenGL和CUDA中使用OpenGL PBO bufferObj;
CUDA运行时将变量
resource中返回一个句柄指向缓冲区。在随后对CUDA运行时对的调用中,将通过这个句柄来访问bufferObj;
cudaGraphicsMapFlagsNone 表示不需要为缓冲区指定特定行为,当然还有标志cudaGraphicsMapFlagsReadOnly只读,cudaGraphicsMapFlagsWriteOnly只写。
我们需要设备内存中的一个实际地址并传递给核函数。
1)告诉CUDA运行时映射这个共享资源;cudaGraphicsMapResources
2)请求一个指向被映射资源的指针。cudaGraphicsResourceGetMappedPointer
// do work with the memory dst being on the GPU, gotten via mapping
HANDLE_ERROR( cudaGraphicsMapResources( 1, &resource, NULL ) );
uchar4* devPtr;
size_t size;
HANDLE_ERROR(
cudaGraphicsResourceGetMappedPointer( (void**)&devPtr,
&size,
resource) );
devPtr: 设备指针
main() 余下工作
1)启动核函数并将指向共享缓冲区指针传递给它;
核函数的作用:生成将要显示的图像数据
2)取消对共享资源的映射;
要在执行绘制任务之前执行取消映射的调用,不但为了确保在应用程序的CUDA部分和图形部分之间实现同步;还能使得在cudaGraphicsUnmapResources()之前的所有CUDA操作完成后,才执行图形调用。
3)通过GLUT注册键盘回调函数和显示回调函数(key_func和draw_func),并通过glutMainLoop()将执行控制交个GLUT绘制循环。
dim3 grids(DIM/16,DIM/16);
dim3 threads(16,16);
kernel<<<grids,threads>>>( devPtr );
HANDLE_ERROR( cudaGraphicsUnmapResources( 1, &resource, NULL ) );
// set up GLUT and kick off main loop
glutKeyboardFunc( key_func );
glutDisplayFunc( draw_func );
glutMainLoop();
}
全部代码如下,
#include "../common/book.h"
#include "../common/cpu_bitmap.h"
#include "cuda.h"
#include "cuda_gl_interop.h"
PFNGLBINDBUFFERARBPROC glBindBuffer = NULL;
PFNGLDELETEBUFFERSARBPROC glDeleteBuffers = NULL;
PFNGLGENBUFFERSARBPROC glGenBuffers = NULL;
PFNGLBUFFERDATAARBPROC glBufferData = NULL;
#define DIM 512
GLuint bufferObj;
cudaGraphicsResource *resource;
// based on ripple code, but uses uchar4 which is the type of data
// graphic inter op uses. see screenshot - basic2.png
__global__ void kernel( uchar4 *ptr ) {
// map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int offset = x + y * blockDim.x * gridDim.x;
// now calculate the value at that position
float fx = x/(float)DIM - 0.5f;
float fy = y/(float)DIM - 0.5f;
unsigned char green = 128 + 127 *
sin( abs(fx*100) - abs(fy*100) );
// accessing uchar4 vs unsigned char*
ptr[offset].x = 0;
ptr[offset].y = green;
ptr[offset].z = 0;
ptr[offset].w = 255;
}
static void key_func( unsigned char key, int x, int y ) {
switch (key) {
case 27:
// clean up OpenGL and CUDA
HANDLE_ERROR( cudaGraphicsUnregisterResource( resource ) );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, 0 );
glDeleteBuffers( 1, &bufferObj );
exit(0);
}
}
static void draw_func( void ) {
// we pass zero as the last parameter, because out bufferObj is now
// the source, and the field switches from being a pointer to a
// bitmap to now mean an offset into a bitmap object
glDrawPixels( DIM, DIM, GL_RGBA, GL_UNSIGNED_BYTE, 0 );
glutSwapBuffers();
}
int main( int argc, char **argv ) {
cudaDeviceProp prop;
int dev;
memset( &prop, 0, sizeof( cudaDeviceProp ) );
prop.major = 1;
prop.minor = 0;
HANDLE_ERROR( cudaChooseDevice( &dev, &prop ) );
// tell CUDA which dev we will be using for graphic interop
// from the programming guide: Interoperability with OpenGL
// requires that the CUDA device be specified by
// cudaGLSetGLDevice() before any other runtime calls.
HANDLE_ERROR( cudaGLSetGLDevice( dev ) );
// these GLUT calls need to be made before the other OpenGL
// calls, else we get a seg fault
glutInit( &argc, argv );
glutInitDisplayMode( GLUT_DOUBLE | GLUT_RGBA );
glutInitWindowSize( DIM, DIM );
glutCreateWindow( "bitmap" );
glBindBuffer = (PFNGLBINDBUFFERARBPROC)GET_PROC_ADDRESS("glBindBuffer");
glDeleteBuffers = (PFNGLDELETEBUFFERSARBPROC)GET_PROC_ADDRESS("glDeleteBuffers");
glGenBuffers = (PFNGLGENBUFFERSARBPROC)GET_PROC_ADDRESS("glGenBuffers");
glBufferData = (PFNGLBUFFERDATAARBPROC)GET_PROC_ADDRESS("glBufferData");
// the first three are standard OpenGL, the 4th is the CUDA reg
// of the bitmap these calls exist starting in OpenGL 1.5
glGenBuffers( 1, &bufferObj );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, bufferObj );
glBufferData( GL_PIXEL_UNPACK_BUFFER_ARB, DIM * DIM * 4,
NULL, GL_DYNAMIC_DRAW_ARB );
HANDLE_ERROR(
cudaGraphicsGLRegisterBuffer( &resource,
bufferObj,
cudaGraphicsMapFlagsNone ) );
// do work with the memory dst being on the GPU, gotten via mapping
HANDLE_ERROR( cudaGraphicsMapResources( 1, &resource, NULL ) );
uchar4* devPtr;
size_t size;
HANDLE_ERROR(
cudaGraphicsResourceGetMappedPointer( (void**)&devPtr,
&size,
resource) );
dim3 grids(DIM/16,DIM/16);
dim3 threads(16,16);
kernel<<<grids,threads>>>( devPtr );
HANDLE_ERROR( cudaGraphicsUnmapResources( 1, &resource, NULL ) );
// set up GLUT and kick off main loop
glutKeyboardFunc( key_func );
glutDisplayFunc( draw_func );
glutMainLoop();
}
glDrawPixels()最后一个参数是一个缓冲区指针,如果没有任何缓冲区绑定为GL_PIXEL_UNPACK_BUFFER_ARB源,那么OpenGL驱动程序将从这个缓冲区中进行复制。然而,由于数据已经位于GPU上,并且我们已经将共享缓冲区绑定为GL_PIXEL_UNPACK_BUFFER_ARB源,因此最后一个参数将变成绑定缓冲区的一个偏移。由于我们要绘制整个缓冲区,因此这个偏移值就是0。
key_func() 将相应Esc响应,并退出。
8.3 基于图形互操作性的GPU波纹实例
#include "../common/book.h"
#include "../common/gpu_anim.h"
#define DIM 1024
__global__ void kernel( uchar4 *ptr, int ticks ) {
// map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int offset = x + y * blockDim.x * gridDim.x;
// now calculate the value at that position
float fx = x - DIM/2;
float fy = y - DIM/2;
float d = sqrtf( fx * fx + fy * fy );
unsigned char grey = (unsigned char)(128.0f + 127.0f *
cos(d/10.0f - ticks/7.0f) /
(d/10.0f + 1.0f));
ptr[offset].x = grey;
ptr[offset].y = grey;
ptr[offset].z = grey;
ptr[offset].w = 255;
}
void generate_frame( uchar4 *pixels, void*, int ticks ) {
dim3 grids(DIM/16,DIM/16);
dim3 threads(16,16);
kernel<<<grids,threads>>>( pixels, ticks );
}
int main( void ) {
GPUAnimBitmap bitmap( DIM, DIM, NULL );
bitmap.anim_and_exit(
(void (*)(uchar4*,void*,int))generate_frame, NULL );
}
8.3.1 GPUAnimBitmap 结构
#ifndef __GPU_ANIM_H__
#define __GPU_ANIM_H__
#include "gl_helper.h"
#include "cuda.h"
#include "cuda_gl_interop.h"
#include <iostream>
PFNGLBINDBUFFERARBPROC glBindBuffer = NULL;
PFNGLDELETEBUFFERSARBPROC glDeleteBuffers = NULL;
PFNGLGENBUFFERSARBPROC glGenBuffers = NULL;
PFNGLBUFFERDATAARBPROC glBufferData = NULL;
struct GPUAnimBitmap {
GLuint bufferObj;
cudaGraphicsResource *resource;
int width, height;
void *dataBlock;
void (*fAnim)(uchar4*,void*,int);
void (*animExit)(void*);
void (*clickDrag)(void*,int,int,int,int);
int dragStartX, dragStartY;
GPUAnimBitmap( int w, int h, void *d = NULL ) {
width = w;
height = h;
dataBlock = d;
clickDrag = NULL;
// first, find a CUDA device and set it to graphic interop
cudaDeviceProp prop;
int dev;
memset( &prop, 0, sizeof( cudaDeviceProp ) );
prop.major = 1;
prop.minor = 0;
HANDLE_ERROR( cudaChooseDevice( &dev, &prop ) );
cudaGLSetGLDevice( dev );
// a bug in the Windows GLUT implementation prevents us from
// passing zero arguments to glutInit()
int c=1;
char* dummy = "";
glutInit( &c, &dummy );
glutInitDisplayMode( GLUT_DOUBLE | GLUT_RGBA );
glutInitWindowSize( width, height );
glutCreateWindow( "bitmap" );
glBindBuffer = (PFNGLBINDBUFFERARBPROC)GET_PROC_ADDRESS("glBindBuffer");
glDeleteBuffers = (PFNGLDELETEBUFFERSARBPROC)GET_PROC_ADDRESS("glDeleteBuffers");
glGenBuffers = (PFNGLGENBUFFERSARBPROC)GET_PROC_ADDRESS("glGenBuffers");
glBufferData = (PFNGLBUFFERDATAARBPROC)GET_PROC_ADDRESS("glBufferData");
glGenBuffers( 1, &bufferObj );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, bufferObj );
glBufferData( GL_PIXEL_UNPACK_BUFFER_ARB, width * height * 4,
NULL, GL_DYNAMIC_DRAW_ARB );
HANDLE_ERROR( cudaGraphicsGLRegisterBuffer( &resource, bufferObj, cudaGraphicsMapFlagsNone ) );
}
~GPUAnimBitmap() {
free_resources();
}
void free_resources( void ) {
HANDLE_ERROR( cudaGraphicsUnregisterResource( resource ) );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, 0 );
glDeleteBuffers( 1, &bufferObj );
}
long image_size( void ) const { return width * height * 4; }
void click_drag( void (*f)(void*,int,int,int,int)) {
clickDrag = f;
}
void anim_and_exit( void (*f)(uchar4*,void*,int), void(*e)(void*) ) {
GPUAnimBitmap** bitmap = get_bitmap_ptr();
*bitmap = this;
fAnim = f;
animExit = e;
glutKeyboardFunc( Key );
glutDisplayFunc( Draw );
if (clickDrag != NULL)
glutMouseFunc( mouse_func );
glutIdleFunc( idle_func );
glutMainLoop();
}
// static method used for glut callbacks
static GPUAnimBitmap** get_bitmap_ptr( void ) {
static GPUAnimBitmap* gBitmap;
return &gBitmap;
}
// static method used for glut callbacks
static void mouse_func( int button, int state,
int mx, int my ) {
if (button == GLUT_LEFT_BUTTON) {
GPUAnimBitmap* bitmap = *(get_bitmap_ptr());
if (state == GLUT_DOWN) {
bitmap->dragStartX = mx;
bitmap->dragStartY = my;
} else if (state == GLUT_UP) {
bitmap->clickDrag( bitmap->dataBlock,
bitmap->dragStartX,
bitmap->dragStartY,
mx, my );
}
}
}
// static method used for glut callbacks
static void idle_func( void ) {
static int ticks = 1;
GPUAnimBitmap* bitmap = *(get_bitmap_ptr());
uchar4* devPtr;
size_t size;
HANDLE_ERROR( cudaGraphicsMapResources( 1, &(bitmap->resource), NULL ) );
HANDLE_ERROR( cudaGraphicsResourceGetMappedPointer( (void**)&devPtr, &size, bitmap->resource) );
bitmap->fAnim( devPtr, bitmap->dataBlock, ticks++ );
HANDLE_ERROR( cudaGraphicsUnmapResources( 1, &(bitmap->resource), NULL ) );
glutPostRedisplay();
}
// static method used for glut callbacks
static void Key(unsigned char key, int x, int y) {
switch (key) {
case 27:
GPUAnimBitmap* bitmap = *(get_bitmap_ptr());
if (bitmap->animExit)
bitmap->animExit( bitmap->dataBlock );
bitmap->free_resources();
exit(0);
}
}
// static method used for glut callbacks
static void Draw( void ) {
GPUAnimBitmap* bitmap = *(get_bitmap_ptr());
glClearColor( 0.0, 0.0, 0.0, 1.0 );
glClear( GL_COLOR_BUFFER_BIT );
glDrawPixels( bitmap->width, bitmap->height, GL_RGBA,
GL_UNSIGNED_BYTE, 0 );
glutSwapBuffers();
}
};
#endif // __GPU_ANIM_H__
animExit 动画退出时执行,清理代码;
clickDrag 响应用户的鼠标点击/拖拽等时间。鼠标初始位置(dragStartX, drawStartY)
8.3.2 重新实现基于GPU的波纹动画示例
#include "../common/book.h"
#include "../common/gpu_anim.h"
#define DIM 1024
__global__ void kernel( uchar4 *ptr, int ticks ) {
// map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int offset = x + y * blockDim.x * gridDim.x;
// now calculate the value at that position
float fx = x - DIM/2;
float fy = y - DIM/2;
float d = sqrtf( fx * fx + fy * fy );
unsigned char grey = (unsigned char)(128.0f + 127.0f *
cos(d/10.0f - ticks/7.0f) /
(d/10.0f + 1.0f));
ptr[offset].x = grey;
ptr[offset].y = grey;
ptr[offset].z = grey;
ptr[offset].w = 255;
}
void generate_frame( uchar4 *pixels, void*, int ticks ) {
dim3 grids(DIM/16,DIM/16);
dim3 threads(16,16);
kernel<<<grids,threads>>>( pixels, ticks );
}
int main( void ) {
GPUAnimBitmap bitmap( DIM, DIM, NULL );
bitmap.anim_and_exit(
(void (*)(uchar4*,void*,int))generate_frame, NULL );
}
8.4 基于图形互操作性的热传导
#include "../common/book.h"
#include "../common/gpu_anim.h"
#define DIM 1024
#define MAX_TEMP 1.0f
#define MIN_TEMP 0.0001f
#define SPEED 0.25f
// these exist on the GPU side
texture<float> texConstSrc;
texture<float> texIn;
texture<float> texOut;
// this kernel takes in a 2-d array of floats
// it updates the value-of-interest by a scaled value based
// on itself and its nearest neighbors
__global__ void blend_kernel( float *dst,
bool dstOut ) {
// map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int offset = x + y * blockDim.x * gridDim.x;
int left = offset - 1;
int right = offset + 1;
if (x == 0) left++;
if (x == DIM-1) right--;
int top = offset - DIM;
int bottom = offset + DIM;
if (y == 0) top += DIM;
if (y == DIM-1) bottom -= DIM;
float t, l, c, r, b;
if (dstOut) {
t = tex1Dfetch(texIn,top);
l = tex1Dfetch(texIn,left);
c = tex1Dfetch(texIn,offset);
r = tex1Dfetch(texIn,right);
b = tex1Dfetch(texIn,bottom);
} else {
t = tex1Dfetch(texOut,top);
l = tex1Dfetch(texOut,left);
c = tex1Dfetch(texOut,offset);
r = tex1Dfetch(texOut,right);
b = tex1Dfetch(texOut,bottom);
}
dst[offset] = c + SPEED * (t + b + r + l - 4 * c);
}
// NOTE - texOffsetConstSrc could either be passed as a
// parameter to this function, or passed in __constant__ memory
// if we declared it as a global above, it would be
// a parameter here:
// __global__ void copy_const_kernel( float *iptr,
// size_t texOffset )
__global__ void copy_const_kernel( float *iptr ) {
// map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int offset = x + y * blockDim.x * gridDim.x;
float c = tex1Dfetch(texConstSrc,offset);
if (c != 0)
iptr[offset] = c;
}
// globals needed by the update routine
struct DataBlock {
float *dev_inSrc;
float *dev_outSrc;
float *dev_constSrc;
cudaEvent_t start, stop;
float totalTime;
float frames;
};
void anim_gpu( uchar4* outputBitmap, DataBlock *d, int ticks ) {
HANDLE_ERROR( cudaEventRecord( d->start, 0 ) );
dim3 blocks(DIM/16,DIM/16);
dim3 threads(16,16);
// since tex is global and bound, we have to use a flag to
// select which is in/out per iteration
volatile bool dstOut = true;
for (int i=0; i<90; i++) {
float *in, *out;
if (dstOut) {
in = d->dev_inSrc;
out = d->dev_outSrc;
} else {
out = d->dev_inSrc;
in = d->dev_outSrc;
}
copy_const_kernel<<<blocks,threads>>>( in );
blend_kernel<<<blocks,threads>>>( out, dstOut );
dstOut = !dstOut;
}
float_to_color<<<blocks,threads>>>( outputBitmap,
d->dev_inSrc );
HANDLE_ERROR( cudaEventRecord( d->stop, 0 ) );
HANDLE_ERROR( cudaEventSynchronize( d->stop ) );
float elapsedTime;
HANDLE_ERROR( cudaEventElapsedTime( &elapsedTime,
d->start, d->stop ) );
d->totalTime += elapsedTime;
++d->frames;
printf( "Average Time per frame: %3.1f ms\n",
d->totalTime/d->frames );
}
// clean up memory allocated on the GPU
void anim_exit( DataBlock *d ) {
HANDLE_ERROR( cudaUnbindTexture( texIn ) );
HANDLE_ERROR( cudaUnbindTexture( texOut ) );
HANDLE_ERROR( cudaUnbindTexture( texConstSrc ) );
HANDLE_ERROR( cudaFree( d->dev_inSrc ) );
HANDLE_ERROR( cudaFree( d->dev_outSrc ) );
HANDLE_ERROR( cudaFree( d->dev_constSrc ) );
HANDLE_ERROR( cudaEventDestroy( d->start ) );
HANDLE_ERROR( cudaEventDestroy( d->stop ) );
}
int main( void ) {
DataBlock data;
GPUAnimBitmap bitmap( DIM, DIM, &data );
data.totalTime = 0;
data.frames = 0;
HANDLE_ERROR( cudaEventCreate( &data.start ) );
HANDLE_ERROR( cudaEventCreate( &data.stop ) );
int imageSize = bitmap.image_size();
// assume float == 4 chars in size (ie rgba)
HANDLE_ERROR( cudaMalloc( (void**)&data.dev_inSrc,
imageSize ) );
HANDLE_ERROR( cudaMalloc( (void**)&data.dev_outSrc,
imageSize ) );
HANDLE_ERROR( cudaMalloc( (void**)&data.dev_constSrc,
imageSize ) );
HANDLE_ERROR( cudaBindTexture( NULL, texConstSrc,
data.dev_constSrc,
imageSize ) );
HANDLE_ERROR( cudaBindTexture( NULL, texIn,
data.dev_inSrc,
imageSize ) );
HANDLE_ERROR( cudaBindTexture( NULL, texOut,
data.dev_outSrc,
imageSize ) );
// intialize the constant data
float *temp = (float*)malloc( imageSize );
for (int i=0; i<DIM*DIM; i++) {
temp[i] = 0;
int x = i % DIM;
int y = i / DIM;
if ((x>300) && (x<600) && (y>310) && (y<601))
temp[i] = MAX_TEMP;
}
temp[DIM*100+100] = (MAX_TEMP + MIN_TEMP)/2;
temp[DIM*700+100] = MIN_TEMP;
temp[DIM*300+300] = MIN_TEMP;
temp[DIM*200+700] = MIN_TEMP;
for (int y=800; y<900; y++) {
for (int x=400; x<500; x++) {
temp[x+y*DIM] = MIN_TEMP;
}
}
HANDLE_ERROR( cudaMemcpy( data.dev_constSrc, temp,
imageSize,
cudaMemcpyHostToDevice ) );
// initialize the input data
for (int y=800; y<DIM; y++) {
for (int x=0; x<200; x++) {
temp[x+y*DIM] = MAX_TEMP;
}
}
HANDLE_ERROR( cudaMemcpy( data.dev_inSrc, temp,
imageSize,
cudaMemcpyHostToDevice ) );
free( temp );
bitmap.anim_and_exit( (void (*)(uchar4*,void*,int))anim_gpu,
(void (*)(void*))anim_exit );
}