CUDA学习笔记(三)device管理

device管理

NVIDIA提供了集中凡是来查询和管理GPU device,掌握GPU信息查询很重要,因为这可以帮助你设置kernel的执行配置。

本博文将主要介绍下面两方面内容:

  • CUDA runtime API function
  • NVIDIA系统管理命令行

使用runtime API来查询GPU信息

你可以使用下面的function来查询所有关于GPU device 的信息:

cudaError_t cudaGetDeviceProperties(cudaDeviceProp *prop, int device);

GPU的信息放在cudaDeviceProp这个结构体中。

代码

 

 
  1. #include <cuda_runtime.h>

  2. #include <stdio.h>

  3. int main(int argc, char **argv) {

  4.   printf("%s Starting...\n", argv[0]);

  5. int deviceCount = 0;

  6. cudaError_t error_id = cudaGetDeviceCount(&deviceCount);

  7. if (error_id != cudaSuccess) {

  8. printf("cudaGetDeviceCount returned %d\n-> %s\n",

  9. (int)error_id, cudaGetErrorString(error_id));

  10. printf("Result = FAIL\n");

  11. exit(EXIT_FAILURE);

  12. }

  13. if (deviceCount == 0) {

  14. printf("There are no available device(s) that support CUDA\n");

  15. } else {

  16. printf("Detected %d CUDA Capable device(s)\n", deviceCount);

  17. }

  18.  
  19. int dev, driverVersion = 0, runtimeVersion = 0;

  20. dev =0;

  21. cudaSetDevice(dev);

  22. cudaDeviceProp deviceProp;

  23. cudaGetDeviceProperties(&deviceProp, dev);

  24. printf("Device %d: \"%s\"\n", dev, deviceProp.name);

  25. cudaDriverGetVersion(&driverVersion);

  26. cudaRuntimeGetVersion(&runtimeVersion);

  27. printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n",driverVersion/1000, (driverVersion%100)/10,runtimeVersion/1000, (runtimeVersion%100)/10);

  28. printf(" CUDA Capability Major/Minor version number: %d.%d\n",deviceProp.major, deviceProp.minor);

  29. printf(" Total amount of global memory: %.2f MBytes (%llu bytes)\n",(float)deviceProp.totalGlobalMem/(pow(1024.0,3)),(unsigned long long) deviceProp.totalGlobalMem);

  30. printf(" GPU Clock rate: %.0f MHz (%0.2f GHz)\n",deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f);

  31. printf(" Memory Clock rate: %.0f Mhz\n",deviceProp.memoryClockRate * 1e-3f);

  32. printf(" Memory Bus Width: %d-bit\n",deviceProp.memoryBusWidth);

  33. if (deviceProp.l2CacheSize) {

  34. printf(" L2 Cache Size: %d bytes\n",

  35. deviceProp.l2CacheSize);

  36. }

  37.  
  38. printf(" Max Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d,%d), 3D=(%d,%d,%d)\n",

  39. deviceProp.maxTexture1D , deviceProp.maxTexture2D[0],

  40. deviceProp.maxTexture2D[1],

  41. deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1],

  42. deviceProp.maxTexture3D[2]);

  43.  
  44. printf(" Max Layered Texture Size (dim) x layers 1D=(%d) x %d, 2D=(%d,%d) x %d\n",

  45. deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1],

  46. deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1],

  47. deviceProp.maxTexture2DLayered[2]);

  48.  
  49. printf(" Total amount of constant memory: %lu bytes\n",deviceProp.totalConstMem);

  50. printf(" Total amount of shared memory per block: %lu bytes\n",deviceProp.sharedMemPerBlock);

  51. printf(" Total number of registers available per block: %d\n",deviceProp.regsPerBlock);

  52. printf(" Warp size: %d\n", deviceProp.warpSize);

  53. printf(" Maximum number of threads per multiprocessor: %d\n",deviceProp.maxThreadsPerMultiProcessor);

  54. printf(" Maximum number of threads per block: %d\n",deviceProp.maxThreadsPerBlock);

  55.  
  56. printf(" Maximum sizes of each dimension of a block: %d x %d x %d\n",

  57. deviceProp.maxThreadsDim[0],

  58. deviceProp.maxThreadsDim[1],

  59. deviceProp.maxThreadsDim[2]);

  60.  
  61. printf(" Maximum sizes of each dimension of a grid: %d x %d x %d\n",

  62. deviceProp.maxGridSize[0],

  63. deviceProp.maxGridSize[1],

  64. deviceProp.maxGridSize[2]);

  65.  
  66. printf(" Maximum memory pitch: %lu bytes\n", deviceProp.memPitch);

  67.  
  68. exit(EXIT_SUCCESS);

  69. }

编译运行:

 

 
  1. $ nvcc checkDeviceInfor.cu -o checkDeviceInfor

  2. $ ./checkDeviceInfor

输出:

 

 
  1. ./checkDeviceInfor Starting...

  2. Detected 2 CUDA Capable device(s)

  3. Device 0: "Tesla M2070"

  4. CUDA Driver Version / Runtime Version 5.5 / 5.5

  5. CUDA Capability Major/Minor version number: 2.0

  6. Total amount of global memory: 5.25 MBytes (5636554752 bytes)

  7. GPU Clock rate: 1147 MHz (1.15 GHz)

  8. Memory Clock rate: 1566 Mhz

  9. Memory Bus Width: 384-bit

  10. L2 Cache Size: 786432 bytes

  11. Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)

  12. Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048

  13. Total amount of constant memory: 65536 bytes

  14. Total amount of shared memory per block: 49152 bytes

  15. Total number of registers available per block: 32768

  16. Warp size: 32

  17. Maximum number of threads per multiprocessor: 1536

  18. Maximum number of threads per block: 1024

  19. Maximum sizes of each dimension of a block: 1024 x 1024 x 64

  20. Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

  21. Maximum memory pitch: 2147483647 bytes

决定最佳GPU

对于支持多GPU的系统,是需要从中选择一个来作为我们的device的,抉择出最佳计算性能GPU的一种方法就是由其拥有的处理器数量决定,可以用下面的代码来选择最佳GPU。

 

 
  1. int numDevices = 0;

  2. cudaGetDeviceCount(&numDevices);

  3. if (numDevices > 1) {

  4. int maxMultiprocessors = 0, maxDevice = 0;

  5. for (int device=0; device<numDevices; device++) {

  6. cudaDeviceProp props;

  7. cudaGetDeviceProperties(&props, device);

  8. if (maxMultiprocessors < props.multiProcessorCount) {

  9. maxMultiprocessors = props.multiProcessorCount;

  10. maxDevice = device;

  11. }

  12. }

  13. cudaSetDevice(maxDevice);

  14. }

 

使用nvidia-smi来查询GPU信息

nvidia-smi是一个命令行工具,可以帮助你管理操作GPU device,并且允许你查询和更改device状态。

nvidia-smi用处很多,比如,下面的指令:

 

 
  1. $ nvidia-smi -L

  2. GPU 0: Tesla M2070 (UUID: GPU-68df8aec-e85c-9934-2b81-0c9e689a43a7)

  3. GPU 1: Tesla M2070 (UUID: GPU-382f23c1-5160-01e2-3291-ff9628930b70)

然后可以使用下面的命令来查询GPU 0 的详细信息:

 

$nvidia-smi –q –i 0

下面是该命令的一些参数,可以精简nvidia-smi的显示信息:

MEMORY

UTILIZATION

ECC

TEMPERATURE

POWER

CLOCK

COMPUTE

PIDS

PERFORMANCE

SUPPORTED_CLOCKS

PAGE_RETIREMENT

ACCOUNTING

比如,显示只device memory的信息:

 

 
  1. $nvidia-smi –q –i 0 –d MEMORY | tail –n 5

  2. Memory Usage

  3. Total : 5375 MB

  4. Used : 9 MB

  5. Free : 5366 MB

设置device

对于多GPU系统,使用nvidia-smi可以查看各GPU属性,每个GPU从0开始依次标注,使用环境变量CUDA_VISIBLE_DEVICES可以指定GPU而不用修改application。

可以设置环境变量CUDA_VISIBLE_DEVICES-2来屏蔽其他GPU,这样只有GPU2能被使用。当然也可以使用CUDA_VISIBLE_DEVICES-2,3来设置多个GPU,他们的device ID分别为0和1.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值