nvidia-smi - NVIDIA System Management Interface program
1. nvidia-smi
deepnorth@deepnorth-amax:~/software$ nvidia-smi
Tue Jul 9 22:24:29 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:1B:00.0 Off | N/A |
| 32% 48C P0 66W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:1C:00.0 Off | N/A |
| 34% 49C P0 60W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:1D:00.0 Off | N/A |
| 34% 49C P0 63W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:1E:00.0 Off | N/A |
| 33% 49C P0 54W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 208... Off | 00000000:89:00.0 Off | N/A |
| 32% 48C P0 78W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 208... Off | 00000000:8A:00.0 Off | N/A |
| 34% 49C P0 65W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce RTX 208... Off | 00000000:8B:00.0 Off | N/A |
| 33% 49C P0 52W / 250W | 0MiB / 11019MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce RTX 208... Off | 00000000:8C:00.0 Off | N/A |
| 33% 50C P0 55W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
deepnorth@deepnorth-amax:~/software$
(base) yongqiang@famu-sys:~$ nvidia-smi
Thu Jan 16 18:55:16 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 On | N/A |
| 58% 76C P2 120W / 250W | 5139MiB / 11177MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 60% 76C P2 95W / 250W | 4845MiB / 11178MiB | 24% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:82:00.0 Off | N/A |
| 57% 75C P2 200W / 250W | 4845MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:83:00.0 Off | N/A |
| 60% 78C P2 206W / 250W | 4845MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1306 G /usr/lib/xorg/Xorg 14MiB |
| 0 2865 G /usr/lib/xorg/Xorg 155MiB |
| 0 28830 C ./darknet 4957MiB |
| 1 28830 C ./darknet 4833MiB |
| 2 28830 C ./darknet 4833MiB |
| 3 28830 C ./darknet 4833MiB |
+-----------------------------------------------------------------------------+
(base) yongqiang@famu-sys:~$
2. NVIDIA System Management Interface
NVIDIA System Management Interface
https://developer.nvidia.com/nvidia-system-management-interface
The NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
This utility allows administrators to query GPU device state and with the appropriate privileges, permits administrators to modify GPU device state. It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.
该实用程序允许管理员查询 GPU 设备状态,并具有适当的特权,允许管理员修改 GPU 设备状态。
NVIDIA-smi ships with NVIDIA GPU display drivers on Linux, and with 64bit Windows Server 2008 R2 and Windows 7. Nvidia-smi can report query information as XML or human readable plain text to either standard output or a file. For more details, please refer to the nvidia-smi documentation.
intend [ɪnˈtend]:vt. 打算,想要,意指 vi. 有打算
aid [eɪd]:n. 援助,帮助,助手,帮助者 vt. 援助,帮助,有助于 vi. 帮助
appropriate [əˈprəʊprɪət;(for v.)əˈprəʊprɪeɪt]:adj. 适当的,恰当的,合适的 vt. 占用,拨出
privilege [ˈprɪvəlɪdʒ]:n. 特权,优待 vt. 给予…特权,特免
permit [pəˈmɪt]:vi. 许可,允许 vt. 许可,允许 n. 许可证,执照
administrator [ədˈmɪnɪstreɪtə(r)]:n. 管理人,行政官
3. nvidia-smi.txt
nvidia-smi.txt
https://developer.download.nvidia.cn/compute/DCGM/docs/nvidia-smi-367.38.pdf
nvidia-smi (also NVSMI) provides monitoring and management capabilities for each of NVIDIA’s Tesla, Quadro, GRID and GeForce devices from Fermi and higher architecture families. GeForce Titan series devices are supported for most functions with very limited information provided for the remainder of the Geforce brand. NVSMI is a cross platform tool that supports all standard NVIDIA driver-supported Linux distros, as well as 64bit versions of Windows starting with Windows Server 2008 R2. Metrics can be consumed directly by users via stdout, or provided by file via CSV and XML formats for scripting purposes.
nvidia-smi (also NVSMI) 为 Fermi 和更高架构家族的 NVIDIA’s Tesla, Quadro, GRID and GeForce devices 提供监视和管理功能。支持 GeForce Titan 系列设备的大多数功能,而剩余的 Geforce 品牌提供的信息则非常有限。NVSMI 是一个跨平台工具,支持所有标准的 NVIDIA 驱动程序支持的 Linux 发行版以及从 Windows Server 2008 R2 开始的 64 位版本的 Windows。
Note that much of the functionality of NVSMI is provided by the underlying NVML C-based library. See the NVIDIA developer website link below for more information about NVML. NVML-based python bindings are also available.
The output of NVSMI is not guaranteed to be backwards compatible. However, both NVML and the Python bindings are backwards compatible, and should be the first choice when writing any tools that must be maintained across NVIDIA driver releases.
不保证 NVSMI 的输出向后兼容。但是,NVML 和 Python bindings 都向后兼容,并且在编写必须在 NVIDIA 驱动程序版本之间维护的任何工具时,它应该是首选。
NVML SDK: http://developer.nvidia.com/nvidia-management-library-nvml/
Python bindings: http://pypi.python.org/pypi/nvidia-ml-py/
consume [kənˈsjuːm]:vt. 消耗,消费,使…着迷,挥霍 vi. 耗尽,毁灭,耗尽生命
guarantee [ˌɡærənˈtiː]:n. 保证,担保,保证人,保证书,抵押品 vt. 保证,担保
3.1 OPTIONS
GENERAL OPTIONS
-h, --help
Print usage information and exit.
SUMMARY OPTIONS
-L, --list-gpus
List each of the NVIDIA GPUs in the system, along with their UUIDs.
(base) yongqiang@famu-sys:~$ nvidia-smi -L
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-b6764bed-48e1-97ac-e1aa-e94916bbf983)
GPU 1: GeForce GTX 1080 Ti (UUID: GPU-dbec65c2-c9a0-393d-3d3b-2dc2e5bbaaf5)
GPU 2: GeForce GTX 1080 Ti (UUID: GPU-51321451-290e-f3ef-73a9-1571d33166cb)
(base) yongqiang@famu-sys:~$
(base) yongqiang@famu-sys:~$ nvidia-smi --list-gpus
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-b6764bed-48e1-97ac-e1aa-e94916bbf983)
GPU 1: GeForce GTX 1080 Ti (UUID: GPU-dbec65c2-c9a0-393d-3d3b-2dc2e5bbaaf5)
GPU 2: GeForce GTX 1080 Ti (UUID: GPU-51321451-290e-f3ef-73a9-1571d33166cb)
(base) yongqiang@famu-sys:~$
QUERY OPTIONS
-q, --query
Display GPU or Unit info. Displayed info includes all data listed in the (GPU ATTRIBUTES) or (UNIT ATTRIBUTES) sections of this document. Some devices and/or environments don’t support all possible information. Any unsupported data is indicated by a "N/A" in the output.
By default information for all available GPUs or Units is displayed. Use the -i
option to restrict the output to a single GPU or Unit.
在输出中将通过 N/A
指示所有不受支持的数据。
(base) yongqiang@famu-sys:~$ nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Sun Nov 24 13:22:58 2019
Driver Version : 390.48
Attached GPUs : 3
GPU 00000000:02:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-b6764bed-48e1-97ac-e1aa-e94916bbf983
Minor Number : 0
VBIOS Version : 86.02.39.00.2A
MultiGPU Board : No
Board ID : 0x200
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:02:00.0
Sub System Id : 0x36091462
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 29 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11177 MiB
Used : 156 MiB
Free : 11021 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 6 MiB
Free : 250 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 2 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 15 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 9.41 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1936 MHz
SM : 1936 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 1289
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 14 MiB
Process ID : 24402
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 139 MiB
GPU 00000000:03:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-dbec65c2-c9a0-393d-3d3b-2dc2e5bbaaf5
Minor Number : 1
VBIOS Version : 86.02.39.00.2A
MultiGPU Board : No
Board ID : 0x300
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:03:00.0
Sub System Id : 0x36091462
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 29 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11178 MiB
Used : 2 MiB
Free : 11176 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 18 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 8.04 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1936 MHz
SM : 1936 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 00000000:82:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-51321451-290e-f3ef-73a9-1571d33166cb
Minor Number : 2
VBIOS Version : 86.02.39.00.2A
MultiGPU Board : No
Board ID : 0x8200
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x82
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:82:00.0
Sub System Id : 0x36091462
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 29 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11178 MiB
Used : 2 MiB
Free : 11176 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 22 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 8.82 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1936 MHz
SM : 1936 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
(base) yongqiang@famu-sys:~$
-l SEC, --loop=SEC
Continuously report query data at the specified interval, rather than the default of just once. The application will sleep in-between queries. Note that on Linux ECC error or XID error events will print out during the sleep period if the -x
flag was not specified. Pressing Ctrl+C
at any time will abort the loop, which will otherwise run indefinitely. If no argument is specified for the -l
form a default interval of 5 seconds is used.
以指定的时间间隔连续报告查询数据,而不是默认值一次。该应用程序将在查询之间休眠。请注意,在 Linux 上,如果未指定 -x
标志,则在睡眠期间将输出 ECC error or XID error 事件。任何时候按 Ctrl+C
都会终止循环,否则循环将无限期地运行。如果没有为 -l
格式指定参数,则使用默认间隔 5 秒。
# 1 second
(base) yongqiang@famu-sys:~$ nvidia-smi -l 1
-lms ms, --loop-ms=ms
Same as -l
,\-\-loop
but in milliseconds.
单位是 millisecond (ms )。
# 1000 millisecond
(base) yongqiang@famu-sys:~$ nvidia-smi -lms 1000
continuously [kənˈtɪnjuəsli]:adv. 连续不断地
indefinitely [ɪnˈdefɪnətli]:adv. 不确定地,无限期地,模糊地,不明确地
3.2 RETURN VALUE
Return code reflects whether the operation succeeded or failed and what was the reason of failure. (返回代码反映操作是成功还是失败以及失败的原因是什么。)
Return code 0 - Success
Return code 2 - A supplied argument or flag is invalid
Return code 3 - The requested operation is not available on target device
Return code 4 - The current user does not have permission to access this device or perform this operation
Return code 6 - A query to find an object was unsuccessful
Return code 8 - A device’s external power cables are not properly attached
Return code 9 - NVIDIA driver is not loaded
Return code 10 - NVIDIA Kernel detected an interrupt issue with a GPU
Return code 12 - NVML Shared Library couldn’t be found or loaded
Return code 13 - Local version of NVML doesn’t implement this function
Return code 14 - infoROM is corrupted
Return code 15 - The GPU has fallen off the bus or has otherwise become inaccessible
Return code 255 - Other error or internal driver error occurred
corrupt [kəˈrʌpt]:adj. 腐败的,贪污的,堕落的 vt. 使腐烂,使堕落,使恶化 vi. 堕落,腐化,腐烂
occur [əˈkɜː(r)]:vi. 发生,出现,存在
invoke [ɪnˈvəʊk]:vt. 调用,祈求,引起,恳求
3.3 GPU ATTRIBUTES
The following list describes all possible data returned by the -q
device query option. Unless otherwise noted all numerical results are base 10 and unitless.
除非另有说明,否则所有数值结果 base 10 且无单位。
Timestamp - 时间戳
The current system timestamp at the time nvidia-smi
was invoked. Format is Day-of-week Month Day HH:MM:SS Year
.
Driver Version
The version of the installed NVIDIA display driver. This is an alphanumeric string.
alphanumeric [ˌælfənjuːˈmerɪk]:adj. 字母数字的
Attached GPUs
The number of NVIDIA GPUs in the system.
Product Name (Name)
The official product name of the GPU. This is an alphanumeric string. For all products.
Display Mode
A flag that indicates whether a physical display (e.g. monitor) is currently connected to any of the GPU’s connectors. Enabled indicates an attached display. Disabled indicates otherwise.
一个标志,指示当前是否将物理显示器 (例如监视器) 连接到任何 GPU 的连接器。Enabled 表示附加的显示。Disabled 表示相反。
Display Active (Disp.A)
A flag that indicates whether a display is initialized on the GPU’s (e.g. memory is allocated on the device for display). Display can be active even when no monitor is physically attached. Enabled indicates an active display. Disabled indicates otherwise.
一个标志,指示是否在 GPU 上初始化显示 (例如,在设备上分配了内存以进行显示)。即使没有物理连接显示器,显示也可以激活。Enabled 表示活动显示。Disable 表示相反。
(base) yongqiang@famu-sys:~$ nvidia-smi
Mon Nov 25 09:03:32 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 On | N/A |
| 29% 15C P8 8W / 250W | 156MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 29% 17C P8 8W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:82:00.0 Off | N/A |
| 29% 22C P8 8W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1289 G /usr/lib/xorg/Xorg 14MiB |
| 0 24402 G /usr/lib/xorg/Xorg 139MiB |
+-----------------------------------------------------------------------------+
(base) yongqiang@famu-sys:~$
Persistence Mode (Persistence-M)
A flag that indicates whether persistence mode is enabled for the GPU. Value is either Enabled or Disabled. When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, such as X11 or nvidia-smi
, exist. This minimizes the driver load latency associated with running dependent apps, such as CUDA programs. For all CUDA-capable products. Linux only.
一个标志,指示是否为 GPU 启用了持久模式,值是 Enabled or Disabled。启用持久性模式后,即使没有活动的客户端 (such as X11 or nvidia-smi),NVIDIA 驱动程序也会保持加载状态。这样可以最大程度地减少与运行依赖的应用程序 (例如 CUDA 程序) 相关的驱动程序加载延迟。适用于所有支持 CUDA 的产品。仅 Linux。
Accounting Mode
A flag that indicates whether accounting mode is enabled for the GPU Value is either When accounting is enabled statistics are calculated for each compute process running on the GPU. Statistics can be queried during the lifetime or after termination of the process. The execution time of process is reported as 0 while the process is in running state and updated to actual execution time after the process has terminated. See --help-query-accounted-apps
for more info.
指示是否为 GPU Value 启用 accounting mode 的标志为启用 accounting 时,将为 GPU 上运行的每个计算进程计算统计信息。可以在生命周期内或过程终止后查询统计信息。进程处于运行状态时,进程的执行时间报告为 0,并在进程终止后更新为实际的执行时间。有关更多信息,请参见 --help-query-accounted-apps
。
Accounting Mode Buffer Size
Returns the size of the circular buffer that holds list of processes that can be queried for accounting stats. This is the maximum number of processes that accounting information will be stored for before information about oldest processes will get overwritten by information about new processes.
返回循环缓冲区的大小,该缓冲区包含可查询 accounting 统计信息的进程列表。这是在最旧进程的信息被新进程的信息覆盖之前,accounting 信息将被存储的最大进程数。
account [əˈkaʊnt]:n. 账户,解释,账目,账单,理由,描述 vi. 解释,导致,报账 vt. 认为,把...视为
statistics [stə'tɪstɪks],stats:n. 统计,统计学,统计资料
Driver Model
On Windows, the TCC and WDDM driver models are supported. The driver model can be changed with the (-dm) or (-fdm) flags. The TCC driver model is optimized for compute applications. I.E. kernel launch times will be quicker with TCC. The WDDM driver model is designed for graphics applications and is not recommended for compute applications. Linux does not support multiple driver models, and will always have the value of “N/A”.
在 Windows 上,支持 TCC 和 WDDM 驱动程序模型。可以使用 (-dm) or (-fdm) 标志更改驱动程序型号。TCC 驱动程序模型针对计算应用程序进行了优化。即使用TCC,内核启动时间将更快。WDDM 驱动程序模型设计用于图形应用程序,不建议用于计算应用程序。
Current - The driver model currently in use. Always “N/A” on Linux.
Pending - The driver model that will be used on the next reboot. Always “N/A” on Linux.
Serial Number
This number matches the serial number physically printed on each board. It is a globally unique immutable alphanumeric value.
immutable [ɪˈmjuːtəbl]:adj. 不变的,不可变的,不能变的
physically [ˈfɪzɪkli]:adv. 肉体地,身体上地,依据自然规律,按自然法则,根本上
pend [pend]:v. 吊着,悬而未决,待决
Fan Speed (Fan)
The fan speed value is the percent of maximum speed that the device’s fan is currently intended to run at. It ranges from 0 to 100%. Note: The reported speed is the intended fan speed. If the fan is physically blocked and unable to spin, this output will not match the actual fan speed. Many parts do not report fan speeds because they rely on cooling via fans in the surrounding enclosure. For all discrete products with dedicated fans.
风扇速度值是设备的风扇当前打算以最大速度的百分比运行。范围是 0 到 100%。注意:报告的速度是预期的风扇速度。如果风扇被物理阻塞且无法旋转,则此输出将与实际风扇速度不匹配。许多零件没有报告风扇速度,因为它们依靠周围机柜中的风扇进行冷却。适用于所有带有专用风扇的分立产品。
spin [spɪn]:vi. 旋转,纺纱,吐丝,晕眩 vt. 使旋转,纺纱,编造,结网 n. 旋转,疾驰
enclosure [ɪnˈkləʊʒə(r)]:n. 附件,围墙,围场
dedicate [ˈdedɪkeɪt]:vt. 致力,献身,题献
Performance State (Perf)
The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).
GPU 的当前性能状态。状态范围从 P0 (最大性能) 到 P12 (最小性能)。
性能状态,从 P0 到 P12,P0 表示最大性能,P12 表示最小性能 (GPU 未工作时为 P0,达到最大工作限度时为 P12)。
Replay counter
This is the internal counter that records various errors on the PCIe bus.
这是内部计数器,用于记录 PCIe 总线上的各种错误。
Tx Throughput
The GPU-centric transmission throughput across the PCIe bus in MB/s over the past 20ms. Only supported on Maxwell architectures and newer.
在过去 20 毫秒内,PCIe 总线上 GPU-centric 的传输吞吐量 (MB/s)。
Rx Throughput
The GPU-centric receive throughput across the PCIe bus in MB/s over the past 20ms. Only supported on Maxwell architectures and newer.
在过去 20 毫秒内,PCIe 总线上 GPU-centric 接收吞吐量 (MB/s)。
replay [ˈriːpleɪ]:v. 重放 (录音带、录像带或电影),重现,重演,(由于未决出胜负而进行的) 重新举行 (比赛),重赛,反复回想 n. 重赛,重放,重演,重演的事物,重复出现的事物
throughput [ˈθruːpʊt]:n. (某一时期内的) 生产量,接待人数,吞吐量
transmission [trænzˈmɪʃn; trænsˈmɪʃn]:n. 传动装置,变速器,传递,传送,播送
Compute Mode (Compute M.)
The compute mode flag indicates whether individual or multiple compute applications may run on the GPU.
计算模式标志指示单个或多个计算应用程序可以在 GPU 上运行。
Default
means multiple contexts are allowed per device.
默认
表示每个设备允许多个上下文。
Exclusive Process
means only one context is allowed per device, usable from multiple threads at a time.
独占进程
表示每个设备只允许一个上下文,一次可以在多个线程中使用。
Prohibited
means no contexts are allowed per device (no compute apps).
禁止
表示每台设备均不允许使用上下文 (无计算应用程序)。
EXCLUSIVE_PROCESS
was added in CUDA 4.0. Prior CUDA releases supported only one exclusive mode, which is equivalent to EXCLUSIVE_THREAD
in CUDA 4.0 and beyond.
CUDA 4.0 中添加了 EXCLUSIVE_PROCESS
。先前的 CUDA 版本仅支持一种独占模式,这等效于 CUDA 4.0 及更高版本中的 EXCLUSIVE_THREAD
。
For all CUDA-capable products.
适用于所有支持 CUDA 的产品。
exclusive [ɪkˈskluːsɪv]:adj. 独有的,排外的,专一的 n. 独家新闻,独家经营的项目,排外者
Utilization
Utilization rates report how busy each GPU is over time, and can be used to determine how much an application is using the GPUs in the system.
利用率报告了每个 GPU 在一段时间内的繁忙程度,可以用来确定应用程序在系统中使用 GPU 的百分比。
Note: During driver initialization when ECC is enabled one can see high GPU and Memory Utilization readings. This is caused by ECC Memory Scrubbing mechanism that is performed during driver initialization.
注意:在启用 ECC 的驱动程序初始化期间,可以看到较高的 GPU 和内存利用率读取。这是由驱动程序初始化期间执行的 ECC 内存清理机制引起的。
scrub [skrʌb]:n. 矮树,洗擦,擦洗者,矮小的人 vt. 用力擦洗,使净化 vi. 擦洗,进行手臂消毒 adj. 矮小的,临时凑合的,次等的
GPU - Percent of time over the past sample period during which one or more kernels was executing on the GPU. The sample period may be between 1 second and 1/6 second depending on the product.
GPU - 过去采样周期内一个或多个内核在 GPU 上执行的时间百分比。取决于产品,采样时间可能在 1 秒钟到 1/6 秒钟之间。
Memory - Percent of time over the past sample period during which global (device) memory was being read or written. The sample period may be between 1 second and 1/6 second depending on the product.
Memory - 过去采样周期内读取或写入全局 (设备) 内存的的时间百分比。取决于产品,采样时间可能在 1 秒钟到 1/6 秒钟之间。
Encoder - Percent of time over the past sample period during which the GPU’s video encoder was being used. The sampling rate is variable and can be obtained directly via the nvmlDeviceGetEn‐coderUtilization()
API
Encoder - 在过去采样周期内使用 GPU 的视频编码器的时间百分比。采样率是可变的,可以直接通过 nvmlDeviceGetEn‐coderUtilization()
API获取。
Decoder - Percent of time over the past sample period during which the GPU’s video decoder was being used. The sampling rate is variable and can be obtained directly via the nvmlDeviceGetDe‐coderUtilization() API
Decoder - 在过去采样期间使用 GPU 的视频解码器的时间百分比。采样率是可变的,可以直接通过 nvmlDeviceGetDe‐coderUtilization()
API获取
FB Memory Usage
On-board frame buffer memory information. Reported total memory is affected by ECC state. If ECC is enabled the total available memory is decreased by several percent, due to the requisite parity bits. The driver may also reserve a small amount of memory for internal use, even without active work on the GPU. For all products.
板载帧缓冲存储器信息。报告的总内存受 ECC 状态影响。如果启用了 ECC,则由于必需的奇偶校验位,总可用内存将减少百分之几。即使没有在 GPU 上进行积极的工作,驱动程序也可能会保留少量内存供内部使用。For all products.
Total - Total size of FB memory.
Used - Used size of FB memory.
Free - Available size of FB memory.
on-board [ˌɑːn ˈbɔːrd]:adj. 在船 (或飞机、车) 上的,主板 (控制) 的
parity [ˈpærəti]:n. 平价,同等,相等,胎次,分娩