本文主要介绍:
- cuda中的时钟频率具体有哪些?
- clock rate怎么调节?
cuda中可以通过nvml 函数或者命令来调整时钟频率(clock rate)
介绍
命令行 nvdia-smi -q -i 0
可以查询device相关参数,可以用后面的命令过滤clock相关参数,
$ nvidia-smi -q -d CLOCK -i 0
Clocks
Graphics Clock : 1410 MHz
SM Clock : 1410 MHz
Memory Clock : 1215 MHz
Video Clock : 1275 MHz
Applications Clocks
Graphics Clock : 1410 MHz
Memory Clock : 1215 MHz
Default Applications Clocks
Graphics Clock : 765 MHz
Memory Clock : 1215 MHz
Deferred Clocks
Memory Clock : N/A
Max Clocks
Graphics Clock : 1410 MHz
SM Clock : 1410 MHz
Memory Clock : 1215 MHz
Video Clock : 1290 MHz
Max Customer Boost Clocks
Graphics Clock : 1410 MHz
Clock Policy
Auto Boost : N/A # (disable/enable)
Auto Boost Default : N/A #(disable/enable)
通过上面可以了解到,最新的CUDA12 显示了以下几个clock
- Clocks
代表的是目前实时频率 - Applications Clocks
Application clock,也就是说CUDA runtime 启动后的时钟频率,启动后就和第一个”Clocks“一样的
当application设置后,无程序跑的时候比较低大概200-600之间。idle clock,运行kernel,其值与Application Clocks一致。
在不支持application的机器上,设置locked clock后,其值为设定的locked clock(当设定的clock rate > boost clock后,会动态变化> boost clock GPU Boost 4.0) - Default Applications Clocks
这个是默认的 Applications Clocks ,当用户设置了Applications Clocks 后,再次返回的时候可以返回到这个默认值 - Deferred Clocks
没有研究 - Max Clocks
最大的时钟频率, 包括超频
对于clock还有其它的一些clock rate 没有在上面体现,具体为下面两项
Base Clock:
(nvidia-smi base-clocks -i 0)The Base Clock of a graphics card (also sometimes referred to as the “Core Clock”) is the minimum speed at which the GPU is advertised to run. In normal conditions, the GPU of the card will not drop below this clock speed unless conditions are significantly altered. This number is more significant in older cards but is becoming less and less relevant as boosting technologies take center stage.
Boost Clock:
The advertised Boost Clock of the card is the maximum clock speed that the graphics card can achieve under normal conditions before the GPU Boost is activated. This clock speed number is generally quite a bit higher than the Base Clock and the card uses up most of its power budget to achieve this number. Unless the card is thermally constrained, it will hit this advertised boost clock. This is also the parameter that is altered in “Factory Overclocked” cards from AIB partners.
其中大小关系是Base Clock <= Boost Clock <= Max Clocks <=max Boost Clocks.
Auto Boost相关说明入如下:
Auto Boost 大部分不支持:
GPU Boost technology allows the card to boost much higher than the advertised “Boost Clock” that may be listed on the box or on the product page.
Increase Performance with GPU Boost and K80 Autoboost | NVIDIA Technical Blog
nvmlDeviceSetAutoBoostedClocksEnabled (nvidia-smi --auto-boost-default=ENABLED -i 0)
所说的不支持指的是不支持打开和关闭
对于cuda的 cudaGetDeviceProperties.clock_rate对应的
cudaDeviceProp device_prop;
err = cudaGetDeviceProperties(&device_prop, device);
if (err != cudaSuccess) {
return (Error_t)err;
}
- 对于支持application clock rate的设备,对应的是上面的:Applications Clocks->Graphics Clock / SM Clock,
- 对于不支持application clock的卡,其值是boost clock,并不是locked clock,这一点需要注意,并且该值只能通过spec去查询,nvml中查询不到,只能查询到base clock
比如:RTX4090 properties.clock_rate = 2520 MHz, 其中通过命令行查询后如下,nvidia-smi -q -i 3 找不到2520Mhz
测试性能的时候,是否需要set max clock rate?或者reset default clock rate?
一定需要的。
可以通过命令行或者API(1.4会详细介绍)修改上面提到的具体运行clock rate , 我们的卡是多人使用,一旦这些参数被人篡改(比较低的值),测试性能急剧下降。并且不稳定。
nvidia-smi --applications-clocks=9001,2520 -i 0
nvidia-smi --reset-applications-clocks -i
nvidia-smi --lock-gpu-clocks=3