2080Ti双卡开启NVLink

最新推荐文章于 2025-04-01 08:32:49 发布

flymyd

最新推荐文章于 2025-04-01 08:32:49 发布

阅读量1.1k

点赞数 4

文章标签： ai AI编程

本文链接：https://blog.csdn.net/flymyd/article/details/145675586

版权

本文介绍了如何在Ubuntu 22.04系统下给2080Ti双卡开启NVLink的方法。
首先要确保安装好CUDA及配置好环境变量

1、开启

nvidia-smi -pm 1
sudo reboot
nvidia-smi topo -m

结果如下：

(base) root@myd-gpu:~# nvidia-smi topo -m
        GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV2     0-11,24-35      0               N/A
GPU1    NV2      X      12-23,36-47     1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

2、测试

下载官方例程

git clone https://github.com/NVIDIA/cuda-samples.git

编译运行

pip install cmake
cd cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest
mkdir build && cd build
cmake ..
make -j$(nproc)
./p2pBandwidthLatencyTest

查看结果

(base) root@myd-gpu:~/cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest/build# ./p2pBandwidthLatencyTest 
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 2080 Ti, pciBusID: 4, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 2080 Ti, pciBusID: 81, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
     D\D     0     1
     0       1     1
     1       1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0 541.95   5.67 
     1   5.72 536.94 
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1 
     0 523.63  47.11 
     1  47.11 536.57 
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0 535.84   8.49 
     1   8.44 533.98 
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0 534.00  94.18 
     1  94.13 533.34 
P2P=Disabled Latency Matrix (us)
   GPU     0      1 
     0   1.48  16.92 
     1  14.64   1.34 

   CPU     0      1 
     0   3.10   9.39 
     1   9.35   3.30 
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1 
     0   1.34   1.46 
     1   1.53   1.34 

   CPU     0      1 
     0   2.96   2.60 
     1   2.73   3.30 

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.