基于soft-RoCE运行OSU Micro Benchmark

之前的文章描述了如何运行Benchmark,但是那个是基于TCP的。现在想要跑一个基于RoCEv2的结果。虚拟机上没有支持infiniband的网卡,那就用Soft RoCE了。

Soft-RoCE的安装和调试

  • 系统版本信息
admin@osu-1:~$ uname -a
Linux osu-1 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • 安装rdma-core和verbs应用
admin@osu-1:~$ sudo apt install rdma-core ibverbs-utils -y
  • 基于已有网口ens8添加ib端口,命名为ib5
admin@osu-1:~$ sudo  rdma link add ib5 type rxe netdev ens8
admin@osu-1:~$  rdma link show
link ib5/1 state ACTIVE physical_state LINK_UP netdev ens8 

安装调试MPI

  • 支持MPI有很多选择:openmpi/mpich/mvapich
  • 经过各种测试和挫折,最后选择mvapich2,谁让它跟OSU Micro Benchmark是一家的呢
  • 提前安装编译过程中需要的软件
admin@osu-1:~$  sudo apt install byacc -y
  • 获取源码
admin@osu-1:~$ wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.7-1.tar.gz
  • 解压后进入目录
admin@osu-1:~$ tar zxvf mvapich2-2.3.7-1.tar.gz 
admin@osu-1:~$ cd mvapich2-2.3.7-1/
admin@osu-1:~/mvapich2-2.3.7-1$ 
  • configure的时候,注意要带的参数
admin@osu-1:~/mvapich2-2.3.7-1$ ./configure --with-device=ch3:mrail --with-rdma=gen2
  • 然后编译安装
admin@osu-1:~/mvapich2-2.3.7-1$ make -j$(nproc) 
admin@osu-1:~/mvapich2-2.3.7-1$ sudo make install
  • Benchmark已经同步编译好了
admin@osu-1:~/mvapich2-2.3.7-1$ cd osu_benchmarks/mpi/pt2pt/
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ 
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ ls -lt
total 320
-rwxrwxr-x 1 admin admin  6332 11月 17 10:40 osu_multi_lat
-rwxrwxr-x 1 admin admin  6342 11月 17 10:40 osu_latency_mt
-rwxrwxr-x 1 admin admin  6312 11月 17 10:40 osu_latency
-rwxrwxr-x 1 admin admin  6262 11月 17 10:40 osu_bw
-rwxrwxr-x 1 admin admin  6342 11月 17 10:40 osu_latency_mp
-rwxrwxr-x 1 admin admin  6302 11月 17 10:40 osu_mbw_mr
-rwxrwxr-x 1 admin admin  6282 11月 17 10:40 osu_bibw
-rw-rw-r-- 1 admin admin 11904 11月 17 10:40 osu_bibw.o
-rw-rw-r-- 1 admin admin 18072 11月 17 10:40 osu_mbw_mr.o
-rw-rw-r-- 1 admin admin 16872 11月 17 10:40 osu_latency_mt.o
-rw-rw-r-- 1 admin admin 11456 11月 17 10:40 osu_bw.o
-rw-rw-r-- 1 admin admin 10976 11月 17 10:40 osu_latency_mp.o
-rw-rw-r-- 1 admin admin  9688 11月 17 10:40 osu_latency.o
-rw-rw-r-- 1 admin admin  9872 11月 17 10:40 osu_multi_lat.o
-rw-rw-r-- 1 admin admin 28374 11月 17 10:23 Makefile
-rw-r--r-- 1 admin admin 28795 5月  24 01:46 Makefile.in
-rw-r--r-- 1 admin admin  1446 5月  17  2022 Makefile.am
-rw-r--r-- 1 admin admin 13925 5月  17  2022 osu_bibw.c
-rw-r--r-- 1 admin admin 13046 5月  17  2022 osu_bw.c
-rw-r--r-- 1 admin admin  9926 5月  17  2022 osu_latency.c
-rw-r--r-- 1 admin admin  7763 5月  17  2022 osu_latency_mp.c
-rw-r--r-- 1 admin admin 12654 5月  17  2022 osu_latency_mt.c
-rw-r--r-- 1 admin admin 19056 5月  17  2022 osu_mbw_mr.c
-rw-r--r-- 1 admin admin 10070 5月  17  2022 osu_multi_lat.c
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ 
  • 确保mpi的路径加入到PATH
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ which mpirun
/usr/local/bin/mpirun
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ 
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ 
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ PATH=$PATH:/usr/local/bin
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ 

运行

  • 克隆一台和上面一样的虚拟机,两台虚拟机可以通过5.5.5.3和5.5.5.4互相ping通
  • 运行osu_latency,开头有一些WARNING,先不管
admin@osu-1:~/mvapich2-2.3.7-1/osu_benchmarks/mpi/pt2pt$ mpirun_rsh -np 2 5.5.5.3 5.5.5.4 MV2_USE_RoCE=1 MV2_IBA_HCA=ib5 ./osu_latency
[osu-1:mpi_rank_0][rdma_find_network_type] Unable to find the numa process is bound to. Disabling process placement aware hca mapping.
[osu-1:mpi_rank_0][mv2_get_hca_type] **********************WARNING***********************
[osu-1:mpi_rank_0][mv2_get_hca_type] Failed to automatically detect the HCA architecture.
[osu-1:mpi_rank_0][mv2_get_hca_type] This may lead to subpar communication performance.
[osu-1:mpi_rank_0][mv2_get_hca_type] ****************************************************
[osu-1:mpi_rank_0][mv2_get_hca_type] **********************WARNING***********************
[osu-1:mpi_rank_0][mv2_get_hca_type] Failed to automatically detect the HCA architecture.
[osu-1:mpi_rank_0][mv2_get_hca_type] This may lead to subpar communication performance.
[osu-1:mpi_rank_0][mv2_get_hca_type] ****************************************************
[osu-1:mpi_rank_0][mv2_get_hca_type] **********************WARNING***********************
[osu-1:mpi_rank_0][mv2_get_hca_type] Failed to automatically detect the HCA architecture.
[osu-1:mpi_rank_0][mv2_get_hca_type] This may lead to subpar communication performance.
[osu-1:mpi_rank_0][mv2_get_hca_type] ****************************************************
[osu-1:mpi_rank_0][rdma_open_hca] Unknown HCA type: this build of MVAPICH2 does not fully support the HCA found on the system (try with other build options)
[osu-1:mpi_rank_0][mv2_new_get_hca_type] **********************WARNING***********************
[osu-1:mpi_rank_0][mv2_new_get_hca_type] Failed to automatically detect the HCA architecture.
[osu-1:mpi_rank_0][mv2_new_get_hca_type] This may lead to subpar communication performance.
[osu-1:mpi_rank_0][mv2_new_get_hca_type] ****************************************************
[osu-2:mpi_rank_1][rdma_find_network_type] Unable to find the numa process is bound to. Disabling process placement aware hca mapping.
[osu-2:mpi_rank_1][rdma_open_hca] Unknown HCA type: this build of MVAPICH2 does not fully support the HCA found on the system (try with other build options)
[osu-1:mpi_rank_0][rdma_param_handle_heterogeneity] All nodes involved in the job were detected to be homogeneous in terms of processors and interconnects. Setting MV2_HOMOGENEOUS_CLUSTER=1 can improve job startup performance on such systems. The following link has more details on enhancing job startup performance. http://mvapich.cse.ohio-state.edu/performance/job-startup/.
[osu-1:mpi_rank_0][rdma_param_handle_heterogeneity] To suppress this warning, please set MV2_SUPPRESS_JOB_STARTUP_PERFORMANCE_WARNING to 1
# OSU MPI Latency Test v5.9
# Size          Latency (us)
0                     139.61
1                     144.72
2                     141.35
4                     140.04
8                     139.94
16                    140.42
32                    139.10
64                    137.50
128                   142.40
256                   143.07
512                   140.62
1024                  143.64
2048                  175.03
4096                  222.74
  • 同时在另外一台上对ens8做tcpdump,可以抓到UDP的dest_port为1791的报文,正是RoCEv2报文
10:51:43.782588 52:54:00:28:f8:36 > 52:54:00:3c:a8:a3, ethertype IPv4 (0x0800), length 222: 5.5.5.4.63843 > 5.5.5.3.4791: UDP, length 180
10:51:43.782725 52:54:00:3c:a8:a3 > 52:54:00:28:f8:36, ethertype IPv4 (0x0800), length 62: 5.5.5.3.63843 > 5.5.5.4.4791: UDP, length 20
10:51:43.782857 52:54:00:3c:a8:a3 > 52:54:00:28:f8:36, ethertype IPv4 (0x0800), length 222: 5.5.5.3.63843 > 5.5.5.4.4791: UDP, length 180
10:51:43.782865 52:54:00:28:f8:36 > 52:54:00:3c:a8:a3, ethertype IPv4 (0x0800), length 62: 5.5.5.4.63843 > 5.5.5.3.4791: UDP, length 20
10:51:43.782885 52:54:00:28:f8:36 > 52:54:00:3c:a8:a3, ethertype IPv4 (0x0800), length 222: 5.5.5.4.63843 > 5.5.5.3.4791: UDP, length 180
10:51:43.783040 52:54:00:3c:a8:a3 > 52:54:00:28:f8:36, ethertype IPv4 (0x0800), length 62: 5.5.5.3.63843 > 5.5.5.4.4791: UDP, length 20
10:51:43.783146 52:54:00:3c:a8:a3 > 52:54:00:28:f8:36, ethertype IPv4 (0x0800), length 222: 5.5.5.3.63843 > 5.5.5.4.4791: UDP, length 180
10:51:43.783154 52:54:00:28:f8:36 > 52:54:00:3c:a8:a3, ethertype IPv4 (0x0800), length 62: 5.5.5.4.63843 > 5.5.5.3.4791: UDP, length 20
10:51:43.783173 52:54:00:28:f8:36 > 52:54:00:3c:a8:a3, ethertype IPv4 (0x0800), length 222: 5.5.5.4.63843 > 5.5.5.3.4791: UDP, length 180
10:51:43.783312 52:54:00:3c:a8:a3 > 52:54:00:28:f8:36, ethertype IPv4 (0x0800), length 62: 5.5.5.3.63843 > 5.5.5.4.4791: UDP, length 20
10:51:43.783423 52:54:00:3c:a8:a3 > 52:54:00:28:f8:36, ethertype IPv4 (0x0800), length 222: 5.5.5.3.63843 > 5.5.5.4.4791: UDP, length 180
10:51:43.783431 52:54:00:28:f8:36 > 52:54:00:3c:a8:a3, ethertype IPv4 (0x0800), length 62: 5.5.5.4.63843 > 5.5.5.3.4791: UDP, length 20
10:51:43.783451 52:54:00:28:f8:36 > 52:54:00:3c:a8:a3, ethertype IPv4 (0x0800), length 222: 5.5.5.4.63843 > 5.5.5.3.4791: UDP, length 180
  • 如果报文写入文件并用wireshark解析,可以看到是RoCEv2的RC报文
    在这里插入图片描述
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值