1.怎么测试cuda程序的运行时间
由于cuda是异核并行运算,所以如果发现某一过程特别慢,说不定不是该过程导致的,而是上面某些程序没有结束导致的;
cuda核函数时间正规测试代码如下:
checkCudaErrors(cudaDeviceSynchronize());
StopWatchInterface *timer = NULL;
sdkCreateTimer(&timer);
sdkStartTimer(&timer);
// Execute the kernel
transformKernel<<<dimGrid, dimBlock, 0>>>(dData, width, height, angle);
checkCudaErrors(cudaDeviceSynchronize());
sdkStopTimer(&timer);
printf("Processing time: %f (ms)\n", sdkGetTimerValue(&timer));
printf("%.2f Mpixels/sec\n",
(width *height / (sdkGetTimerValue(&timer) / 1000.0f)) / 1e6);
sdkDeleteTimer(&timer);
如果要测多个核函数的运行时间,可以操作如下:
checkCudaErrors(cudaDeviceSynchronize());
StopWatchInterface *timer = NULL;
sdkCreateTimer(&timer);
sdkStartTimer(&timer);
// Execute the kernel
transformKernel<<<dimGrid, dimBlock, 0>>>(dData, width, height, angle);
// Check if kernel execution generated an error
getLastCudaError("Kernel execution failed");
checkCudaErrors(cudaDeviceSynchronize());
sdkStopTimer(&timer);
printf("Processing time: %f (ms)\n", sdkGetTimerValue(&timer));
checkCudaErrors(cudaDeviceSynchronize());
sdkResetTimer(&timer);
sdkStartTimer(&timer);
transformKernel<<<dimGrid, dimBlock, 0>>>(dData, width, height, angle);
checkCudaErrors(cudaDeviceSynchronize());
sdkStopTimer(&timer);
printf("Processing time: %f (ms)\n", sdkGetTimerValue(&timer));
sdkDeleteTimer(&timer);
要想正真了解程序到底是那一部分运行慢了,就需要正规测试程序的每一段运行时间。
checkCudaErrors(cudaDeviceSynchronize());
这句代码的意义就是同步线程,让线程完成后在经行下一步。
如有不当,请指正,谢谢;