cuda的计时技术

最新推荐文章于 2024-09-21 16:27:54 发布

maowenge

最新推荐文章于 2024-09-21 16:27:54 发布

阅读量538

点赞数 1

分类专栏： cuda 文章标签： cuda gpu

cuda 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

在CUDA中统计运算时间，大致有三种方法：

<1>使用cutil.h中的函数
unsigned int timer=0;                  //创建计时器
cutCreateTimer(&timer);              //开始计时
cutStartTimer(timer);
{
     //统计的代码段
     …………
}

//停止计时
cutStopTimer(timer);

//获得从开始计时到停止之间的时间
cutGetTimerValue( timer);

//删除timer值
cutDeleteTimer( timer);

<2>time.h中的clock函数
clock_t start, finish;
float costtime;
start = clock();
{
//统计的代码段
…………
}
finish = clock();

//得到两次记录之间的时间差
costtime = (float)(finish - start) / CLOCKS_PER_SEC;

时钟计时单元的长度为1毫秒，那么计时的精度也为1毫秒。

<3>事件event
cudaEvent_t start,stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecend(start,0);
{
//统计的代码段
…………
}
cudaEventRecord(stop,0);
float costtime;
cudaEventElapsedTime(&costtime,start,stop);

cudaError_t cudaEventCreate( cudaEvent_t* event )---创建事件对象；
cudaError_t cudaEventRecord( cudaEvent_t event，CUstream stream )--- 记录事件；
cudaError_t cudaEventElapsedTime( float* time，cudaEvent_t start，cudaEvent_t end )---计算两次事件之间相差的时间；
cudaError_t cudaEventDestroy( cudaEvent_t event )---销毁事件对象。
计算两次事件之间相差的时间（以毫秒为单位，精度为0.5微秒）。

如果尚未记录其中任何一个事件，此函数将返回cudaErrorInvalidValue。如果记录其中任何一个事件使用了非零流，则结果不确定。

我们来看看cuda创建时的模板中计时是用什么来计时的：

unsigned int timer = 0;
cutilCheckError( cutCreateTimer( &timer));
cutilCheckError( cutStartTimer( timer));

HelloCUDA<<<1, 1, 0>>>(device_result, 11);
cutilCheckMsg("Kernel execution failed\n");

cudaThreadSynchronize();
cutilCheckError( cutStopTimer( timer));
printf("Processing time: %f (ms)\n", cutGetTimerValue( timer));
cutilCheckError( cutDeleteTimer( timer));

这里的精度就是ms

有兴趣的同学可以去试一下~

转载：http://blog.csdn.net/jdhanhua/article/details/4843653

<1>使用cutil.h中的函数
unsigned int timer=0;
//创建计时器
cutCreateTimer(&timer);
//开始计时
cutStartTimer(timer);
{
//统计的代码段
…………
}
//停止计时
cutStopTimer(timer);
//获得从开始计时到停止之间的时间
cutGetTimerValue( timer);
//删除timer值
cutDeleteTimer( timer);

不知道在这种情况下，统计精度。

<2>time.h中的clock函数
clock_t start, finish;
float costtime;
start = clock();
{
//统计的代码段
…………
}
finish = clock();
//得到两次记录之间的时间差
costtime = (float)(finish - start) / CLOCKS_PER_SEC;
时钟计时单元的长度为1毫秒，那么计时的精度也为1毫秒。

<3>事件event
cudaEvent_t start,stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
{
//统计的代码段
…………
}
cudaEventRecord(stop,0);
float costtime;
cudaEventElapsedTime(&costtime,start,stop);

cudaError_t cudaEventCreate( cudaEvent_t* event )---创建事件对象；
cudaError_t cudaEventRecord( cudaEvent_t event，CUstream stream )--- 记录事件；
cudaError_t cudaEventElapsedTime( float* time，cudaEvent_t start，cudaEvent_t end )---计算两次事件之间相差的时间；
cudaError_t cudaEventDestroy( cudaEvent_t event )---销毁事件对象。
计算两次事件之间相差的时间（以毫秒为单位，精度为0.5微秒）。如果尚未记录其中任何一个事件，此函数将返回cudaErrorInvalidValue。如果记录其中任何一个事件使用了非零流，则结果不确定。

以下非转载文章内容：

该例子是CUDA_C_Best_Practices_Guide中的例子：

cudaEvent_t start, stop;

float time;

cudaEventCreate(&start);

cudaEventCreate(&stop);

cudaEventRecord( start, 0 );

kernel<<<grid,threads>>> ( d_odata, d_idata, size_x, size_y, NUM_REPS);

cudaEventRecord( stop, 0 );

cudaEventSynchronize( stop );

cudaEventElapsedTime( &time, start, stop );

cudaEventDestroy( start );

cudaEventDestroy( stop );

--------------------------------------------------------------------------------

下面说一下渲染过程中如何统计FPS

通过OpenGL glui渲染时，必不可少的一个函数就是glutDisplayFunc(display)，这个函数是控制渲染内容的，因此统计时间的操作也必然在这个display中：

一般OpenGl渲染的流程是，CPU把命令给GPU，然后让GPU 去做渲染，但是我们统计时间都是在CPU 端统计的，也就是说当CPU 将命令传送个GPU 后， CPU 会立即执行下一行命令，因此我们统计的只是CPU 发送一个命令的时间，根本不是GPU 渲染一帧的时间。

正确的做法是，在GPU 进行渲染的时候让CPU 等着，直到GPU 渲染完，程序才返回CPU。glFinish()就提供了这个功能。http://www.opengl.org/sdk/docs/man/xhtml/glFinish.xml

GPU 渲染完成后会给CPU 一个信号，这个信号就由glutIdleFunc(idle)来接收http://www.opengl.org/documentation/specs/glut/spec3/node63.html

要准确的统计FPS，可以在display开始的地方记录一个时刻，display执行完后，到idle， idle里放一个glutPostRedisplay()，这样，程序再次回到了display中，这时， display开始处的timer再次记录下当前时刻，两个时刻之差便是渲染一帧用的时间。

当FPS 很高时，可以通过积累时间的方法来获取准确的FPS。例如，一帧可能只有0.000001ms，但是100000帧的时间也就比较大了。

另外glutSwapBuffers也会隐式地调用glFinish()， CUDA SDK中就是这么用的，下面是CUDA SDK的example:

[cpp]view plaincopy 
   
 void computeFPS()  
 {  
 frameCount++;  
   
 fpsCount++;  
   
 if (fpsCount == fpsLimit-1) {  
 g_Verify = true;  
 }  
   
 if (fpsCount == fpsLimit) {  
 char fps[256];  
   
 float ifps = 1.f / (cutGetAverageTimerValue(timer) / 1000.f);  
   
 sprintf(fps, "%sVolume Render: %3.1f fps",   
   
 ((g_CheckRender && g_CheckRender->IsQAReadback()) ? "AutoTest: " : ""), ifps);    
   
 glutSetWindowTitle(fps);  
   
 fpsCount = 0;   
   
 if (g_CheckRender && !g_CheckRender->IsQAReadback())   
   
 fpsLimit = (int)MAX(ifps, 1.f);  
   
 cutilCheckError(cutResetTimer(timer));    
   
 AutoQATest();  
 }  
 }  
   
 void display()  
 {  
 cutilCheckError(cutStopTimer(timer));    
   
 computeFPS();  
   
 cutilCheckError(cutStartTimer(timer));  
   
 render（);  
   
 glutSwapBuffers();  
 ｝  
   
 void idle  
 {  
 glutPostRedisplay();  
 }