I usually use the Intel MPI Benchmark (IMB) as a quick test to check everything is ok from a network point of view (connectivity and performance)
- OpenMPI version: 1.6.3
- Mellanox OFED version: MLNX_OFED_LINUX-1.5.3-4.0.8 (OFED-1.5.3-4.0.8)
- Clustering suite: IBM Platform HPC 3.2 cluster (RHEL 6.2)
- Nodes:
- 1 head node
- 6 compute nodes (5 package based install + 1 diskless)
Note: the diskless node is not used here
1. MLX OFED install
The OS_OFED components need to be disabled for both the compute (all that are in use, i.e. compute-rhel and compute-diskless-rhel in my case) and installer nodegroups.
You can just mount the ISO image and run the install script. The HCA firmware will be flashed if needed during the process.
2. Open MPI install and configuration
2.1 download OpenMPI and uncompress the tarball
mkdir /shared/compile_temp/openmpi/
cd /shared/compile_temp/openmpi/
wget http://www.open-mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.3.tar.bz2
tar -xjf openmpi-1.6.3.tar.bz2
2.2 Configure, compile and install OpenMPI
Note: if LSF is in the PATH, then support for it will be automatically enabled. If not sure, you can force it:
cd openmpi-1.6.3
./configure --prefix=/shared/ompi-1.6.3-lsf --enable-orterun-prefix-by-default --with-lsf=/opt/lsf/8.3 --with-lsf-libdir=/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/lib
Note: the path used for the option --with-lsf-libdir is $LSF_LIBDIR
make
make check
make install
2.3 create an environment module for OpenMPI
2.3.1 add the new PATH to MODULEPATH
I added the following to my .bashrc: export MODULEPATH=$MODULEPATH:/home/mehdi/modules
2.3.2 create the module file (/home/mehdi/modules/ompilsf):
#%Module1.0
##
## dot modulefile
##
proc ModulesHelp { } {
global openmpiversion
puts stderr "\tAdds OpenMPI to your environment variables,"
}
module-whatis "adds OpenMPI to your environment variables"
set openmpiversion 1.6.3
set root /shared/ompi-1.6.3-lsf/
prepend-path PATH $root/bin
prepend-path MANPATH $root/man
setenv MPI_HOME $root
setenv MPI_RUN $root/bin/mpirun
prepend-path LD_RUN_PATH $root/lib
prepend-path LD_LIBRARY_PATH $root/lib
2.3.3 check the module works as expected
The first step is to check which modules are currently loaded:
[mehdi@atsplat1 ~]$ module list
Currently Loaded Modulefiles:
1) null
Then we can check for the available modules:
[mehdi@atsplat1 ~]$ module avail
----------------------------------------------------- /usr/share/Modules/modulefiles -----------------------------------------------------
PMPI/modulefile dot module-cvs module-info modules null use.own
---------------------------------------------------------- /home/mehdi/modules -----------------------------------------------------------
ompi ompilsf
And finally load the module we just created and double check all the environment is properly definedL
[mehdi@atsplat1 ~]$ module load ompilsf
[mehdi@atsplat1 ~]$ module list
Currently Loaded Modulefiles:
1) null 2) ompilsf
[mehdi@atsplat1 ~]$ echo $MPI_HOME
/shared/ompi-1.6.3-lsf/
[mehdi@atsplat1 ~]$ which mpicc
/shared/ompi-1.6.3-lsf/bin/mpicc
[mehdi@atsplat1 ~]$
2.4 check OpenMPI
2.4.1 verify support for openib and lsf are there:
[mehdi@atsplat1 ~]$ ompi_info | grep openib
MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.3)
[mehdi@atsplat1 ~]$ ompi_info | grep lsf
Prefix: /shared/ompi-1.6.3-lsf
MCA ras: lsf (MCA v2.0, API v2.0, Component v1.6.3)
MCA plm: lsf (MCA v2.0, API v2.0, Component v1.6.3)
MCA ess: lsf (MCA v2.0, API v2.0, Component v1.6.3)
[mehdi@atsplat1 ~]$
2.4.1 copy over an example from the OpenMPI distribution and modify it so the output contains the hostname of the node running the MPI instance:
cp /shared/compile_temp/openmpi/openmpi-1.6.3/examples/hello_c.c .
cp hello_c.c hello_c.orig
vim hello_c.c
The diff is below:
[mehdi@atsplat1 ~]$ diff -U4 hello_c.orig hello_c.c
--- hello_c.orig 2013-02-13 13:04:42.074961444 -0600
+++ hello_c.c 2013-02-13 13:06:02.216989733 -0600
@@ -12,13 +12,15 @@
int main(int argc, char* argv[])
{
int rank, size;
+ char hostname[256];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
- printf("Hello, world, I am %d of %d\n", rank, size);
+ gethostname(hostname,255);
+ printf("Hello, world, I am %d of %d on host %s\n", rank, size, hostname);
MPI_Finalize();
return 0;
}
[mehdi@atsplat1 ~]$
2.4.2 compile the modified example:
mpicc -o hello hello_c.c
2.4.3 run the example manually, using a host file and outside of LSF
My hostfile is:
[mehdi@atsplat1 ~]$ cat hosts
compute002
compute004
[mehdi@atsplat1 ~]$
2.4.3.1 run with no specific options and fix any problem
[mehdi@atsplat1 ~]$ mpirun -np 2 --hostfile ./hosts ./hello
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.
This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.
See this Open MPI FAQ item for more information on these Linux kernel module
parameters:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: compute002
Registerable memory: 4096 MiB
Total memory: 32745 MiB
Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
Hello, world, I am 0 of 2 on host compute002
Hello, world, I am 1 of 2 on host compute004
[atsplat1:24675] 1 more process has sent help message help-mpi-btl-openib.txt / reg mem limit low
[atsplat1:24675] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
As you can see, the first try is not really great. Let see if everything is fine when using the TCP fabric:
[mehdi@atsplat1 ~]$ mpirun -np 2 --hostfile ./hosts --mca btl self,tcp ./hello
Hello, world, I am 1 of 2 on host compute004
Hello, world, I am 0 of 2 on host compute002
[mehdi@atsplat1 ~]$
So the problem really comes from my IB setup. Looking at the page mentioned in the output of the job (the first one that used the IB fabric), there are 2 causes for the warning: one related to the Linux kernel and a second one to the locked memory limits.
The max locked memory should not be an issue as my /etc/security/limits.conf on the compute nodes ends with a nice:
* soft memlock unlimited
* hard memlock unlimited
So my issue must be at kernel level
And the following article on developerworks explains very well how to solve the problem:
Looking at the kernel module parameters tells me that I need to change them:
[root@compute004 ~]# cat /sys/module/mlx4_core/parameters/log_num_mtt
0
[root@compute004 ~]# cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
0
[root@compute004 ~]#
And the PAGE_SIZE value is
[root@compute004 ~]# getconf PAGE_SIZE
4096
[root@compute004 ~]#
My nodes have 32G memory so:
max_reg_mem = (2^23) * (2^1) * (4 kB) = 64 GB
which means:
log_num_mtt = 23
log_mtts_per_seg = 1
The way to get the proper values is to add the following line to /etc/modprobe.d/mlx4_en.conf :
options mlx4_core log_num_mtt=23 log_mtts_per_seg=1
Then we need to restart openibd so the modules are unloaded / reloaded and check the new values of log_mtts_per_seg and log_num_mtt:
[root@compute004 ~]# cat /sys/module/mlx4_core/parameters/log_num_mtt
23
[root@compute004 ~]# cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
1
Perfect, now it is time to run the example again:
[mehdi@atsplat1 ~]$ mpirun -np 2 --hostfile ./hosts ./hello
Hello, world, I am 1 of 2 on host compute004
Hello, world, I am 0 of 2 on host compute002
[mehdi@atsplat1 ~]$
So now we can force the use of the IB fabric only:
[mehdi@atsplat1 ~]$ mpirun -np 2 --hostfile ./hosts --mca btl self,openib ./hello
Hello, world, I am 1 of 2 on host compute004
Hello, world, I am 0 of 2 on host compute002
[mehdi@atsplat1 ~]$
2.4.3.2 Final fix in the IBM Platform HPC framework
We use the CFM framework to append the options to the /etc/modprobe.d/mlx4_en.conf file.
In the CFM directory for the compute nodegroup in use. In my case:
/etc/cfm/compute-rhel-6.2-x86_64eth0/
create the appropriate directory and file:
etc/modprobe.d/mlx4_en.conf.append
The mlx4_en.conf.append file contains only 2 lines:
Then run cfmsync -f -n compute-rhel-6.2-x86_64eth0
2.5 compile IMB
mkdir /shared/compile_temp/IMB/
cd /shared/compile_temp/IMB/
Assuming the tarball is already there:
tar -xzf IMB_3.2.3.tgz
Then we need to go in the src directory and create a make file for OpenMPI:
[root@atsplat1 IMB]# cd imb_3.2.3/src/
[root@atsplat1 src]# cp make_mpich make_ompi
The setup of MPI_HOME is commented out, as the module file will set it up.
The diff:
[root@atsplat1 src]# diff -U4 make_mpich make_ompi
--- make_mpich 2011-11-07 16:09:42.000000000 -0600
+++ make_ompi 2013-02-13 09:36:45.616993735 -0600
@@ -1,6 +1,6 @@
# Enter root directory of mpich install
-MPI_HOME=
+#MPI_HOME=
MPICC=$(shell find ${MPI_HOME} -name mpicc -print)
NULL_STRING :=
[root@atsplat1 src]#
Time to compile:
[root@atsplat1 src]# make -f make_ompi
2.6 run IMB to get get the bandwidth associated with each fabric
One of my node showed errors for the InfiniBand card so from now I'll be using a different host file:
[mehdi@atsplat1 ~]$ cat hosts
compute005
compute002
[mehdi@atsplat1 ~]$
2.6.1 run using native InfiniBand:
[mehdi@atsplat1 ~]$ mpirun -np 2 --mca btl self,openib --hostfile ./hosts ./IMB-MPI1 pingpong
benchmarks to run pingpong
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date : Thu Feb 14 08:45:12 2013
# Machine : x86_64
# System : Linux
# Release : 2.6.32-220.el6.x86_64
# Version : #1 SMP Wed Nov 9 08:03:13 EST 2011
# MPI Version : 2.1
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# ./IMB-MPI1 pingpong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.50 0.00
1 1000 1.49 0.64
2 1000 1.26 1.52
4 1000 1.25 3.05
8 1000 1.30 5.89
16 1000 1.30 11.69
32 1000 1.33 22.87
64 1000 1.37 44.60
128 1000 1.98 61.61
256 1000 2.12 115.41
512 1000 2.29 213.54
1024 1000 2.63 370.96
2048 1000 3.46 563.82
4096 1000 4.14 942.99
8192 1000 5.63 1387.80
16384 1000 8.12 1924.25
32768 1000 11.63 2686.07
65536 640 18.67 3347.84
131072 320 32.46 3851.17
262144 160 60.21 4152.16
524288 80 115.53 4327.82
1048576 40 226.39 4417.21
2097152 20 447.33 4471.00
4194304 10 890.95 4489.61
# All processes entering MPI_Finalize
[mehdi@atsplat1 ~]$
The bandwidth and latency are consistents with the hardware capability:
[root@compute004 ~]# ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:5cf3:fc00:0004:ec2b
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 40 Gb/sec (4X QDR)
link_layer: InfiniBand
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:5cf3:fc00:0004:ec2c
base lid: 0x10
sm lid: 0x2
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X FDR10)
link_layer: InfiniBand
[root@compute004 ~]#
2.6.2 run using the TCP fabric: surprise !
[mehdi@atsplat1 ~]$ mpirun -np 2 --mca btl self,tcp --hostfile ./hosts ./IMB-MPI1 pingpong
...
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 17.15 0.00
1 1000 17.35 0.05
2 1000 17.34 0.11
4 1000 17.36 0.22
8 1000 17.41 0.44
16 1000 17.43 0.88
32 1000 17.74 1.72
64 1000 18.55 3.29
128 1000 19.08 6.40
256 1000 20.74 11.77
512 1000 23.93 20.41
1024 1000 29.15 33.50
2048 1000 269.92 7.24
4096 1000 271.10 14.41
8192 1000 272.24 28.70
16384 1000 273.65 57.10
32768 1000 289.68 107.88
65536 640 790.10 79.10
131072 320 1054.85 118.50
262144 160 1397.37 178.91
524288 80 2552.95 195.85
1048576 40 4664.89 214.37
2097152 20 9089.75 220.03
4194304 10 18022.91 221.94
# All processes entering MPI_Finalize
[mehdi@atsplat1 ~]$
The big surprise is the bandwidth, which is 2 times what is expected for a Gb network. This is explained by how OpenMPI is able to aggregate links when they are in compatible networks. This is explained in more details here. Another noticeable effect of this aggregation is that you are limited by the slowest interface, this is why in this case we are getting 2 times the bandwidth for a Gb interface.
2.6.3 run using only the eth0 interface
[mehdi@atsplat1 ~]$ mpirun -np 2 --mca btl self,tcp --mca btl_tcp_if_include eth0 --hostfile ./hosts ./IMB-MPI1 pingpong
...
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 24.52 0.00
1 1000 24.52 0.04
2 1000 24.53 0.08
4 1000 24.53 0.16
8 1000 24.50 0.31
16 1000 24.53 0.62
32 1000 24.53 1.24
64 1000 24.50 2.49
128 1000 24.51 4.98
256 1000 24.55 9.95
512 1000 29.45 16.58
1024 1000 47.34 20.63
2048 1000 551.01 3.54
4096 1000 547.24 7.14
8192 1000 545.31 14.33
16384 1000 551.50 28.33
32768 1000 559.61 55.84
65536 640 1056.22 59.17
131072 320 1391.63 89.82
262144 160 2544.80 98.24
524288 80 4658.02 107.34
1048576 40 9076.14 110.18
2097152 20 18080.40 110.62
4194304 10 35776.65 111.80
# All processes entering MPI_Finalize
[mehdi@atsplat1 ~]$
2.6.4 run using only the IPoIB interface
[mehdi@atsplat1 ~]$ mpirun -np 2 --mca btl self,tcp --mca btl_tcp_if_include ib1 --hostfile ./hosts ./IMB-MPI1 pingpong
...
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 9.33 0.00
1 1000 9.54 0.10
2 1000 9.54 0.20
4 1000 9.57 0.40
8 1000 9.54 0.80
16 1000 9.53 1.60
32 1000 9.54 3.20
64 1000 9.57 6.38
128 1000 9.72 12.56
256 1000 10.42 23.43
512 1000 10.73 45.50
1024 1000 11.32 86.25
2048 1000 12.62 154.76
4096 1000 14.84 263.19
8192 1000 18.66 418.77
16384 1000 26.97 579.31
32768 1000 47.33 660.20
65536 640 95.72 652.96
131072 320 120.42 1038.04
262144 160 198.02 1262.50
524288 80 342.29 1460.73
1048576 40 656.27 1523.77
2097152 20 1273.80 1570.10
4194304 10 2526.25 1583.38
# All processes entering MPI_Finalize
[mehdi@atsplat1 ~]$
2.7 Using LSF to submit jobs
2.7.1 basic submission and how to get information about the LSF job
As mentioned earlier, OpenMPI was compiled with support for LSF, which means we can use mpirun natively in bsub scripts / invocations. For example, the following will request 2 job slots (equivalent by default to 2 cores) on 2 different nodes:
[mehdi@atsplat1 ~]$ bsub -n 2 -R span[ptile=1] mpirun ./IMB-MPI1 pingpong
Job <3525> is submitted to default queue <medium_priority>.
We can see which nodes are running the job:
[mehdi@atsplat1 ~]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
3525 mehdi RUN medium_pri atsplat1 compute002 * pingpong Feb 14 09:03
compute003
And also get detailed information about the job itself, once done:
[mehdi@atsplat1 ~]$ bhist -l 3525
Job <3525>, User <mehdi>, Project <default>, Command <mpirun ./IMB-MPI1 pingpon
g>
Thu Feb 14 09:03:49: Submitted from host <atsplat1>, to Queue <medium_priority>
, CWD <$HOME>, 2 Processors Requested, Requested Resources
<span[ptile=1]>;
Thu Feb 14 09:03:51: Dispatched to 2 Hosts/Processors <compute002> <compute003>
;
Thu Feb 14 09:03:51: Starting (Pid 28169);
Thu Feb 14 09:03:51: Running with execution home </home/mehdi>, Execution CWD <
/home/mehdi>, Execution Pid <28169>;
Thu Feb 14 09:03:53: Done successfully. The CPU time used is 1.3 seconds;
Thu Feb 14 09:04:01: Post job process done successfully;
Summary of time in seconds spent in various states by Thu Feb 14 09:04:01
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
2 0 2 0 0 0 4
[mehdi@atsplat1 ~]$
2.7.2 how to get the output and verify everything is fine
The best way to double check the information given from LSF is exact is to run our hello job again and get the output written in a file. In our case we'll tag the output and error files with the job number (-o%J.out and -e%J.err):
[mehdi@atsplat1 ~]$ bsub -o%J.out -e%J.err -n 2 -R span[ptile=1] mpirun ./hello
Job <3527> is submitted to default queue <medium_priority>.
[mehdi@atsplat1 ~]$
[mehdi@atsplat1 ~]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
3527 mehdi RUN medium_pri atsplat1 compute003 *n ./hello Feb 14 09:10
compute004
[mehdi@atsplat1 ~]$ ls 3527*
3527.err 3527.out
[mehdi@atsplat1 ~]$
Good news, the error file is empty and the output file shows a perfect match between the hostname retrieved as part as the MPI job and the nodes allocated by LSF:
[mehdi@atsplat1 ~]$ cat 3527.err
[mehdi@atsplat1 ~]$ cat 3527.out
Sender: LSF System <hpcadmin@compute003>
Subject: Job 3527: <mpirun ./hello> Done
Job <mpirun ./hello> was submitted from host <atsplat1> by user <mehdi> in cluster <atsplat1_cluster1>.
Job was executed on host(s) <1*compute003>, in queue <medium_priority>, as user <mehdi> in cluster <atsplat1_cluster1>.
<1*compute004>
</home/mehdi> was used as the home directory.
</home/mehdi> was used as the working directory.
Started at Thu Feb 14 09:10:20 2013
Results reported at Thu Feb 14 09:10:28 2013
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun ./hello
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 0.19 sec.
Max Memory : 1 MB
Max Swap : 30 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
Hello, world, I am 0 of 2 on host compute003
Hello, world, I am 1 of 2 on host compute004
PS:
Read file <3527.err> for stderr output of this job.
[mehdi@atsplat1 ~]$
2.7.2 Submitting the IMB jobs through LSF
2.7.2.1 using the IB fabric
[mehdi@atsplat1 ~]$ bsub -o%J.out -e%J.err -n 2 -R span[ptile=1] mpirun --mca btl self,openib ./IMB-MPI1 pingpong
Job <3529> is submitted to default queue <medium_priority>.
And the output:
[mehdi@atsplat1 ~]$ cat 3529.out
Sender: LSF System <hpcadmin@compute005>
Subject: Job 3529: <mpirun --mca btl self,openib ./IMB-MPI1 pingpong> Done
Job <mpirun --mca btl self,openib ./IMB-MPI1 pingpong> was submitted from host <atsplat1> by user <mehdi> in cluster <atsplat1_cluster1>.
Job was executed on host(s) <1*compute005>, in queue <medium_priority>, as user <mehdi> in cluster <atsplat1_cluster1>.
<1*compute006>
</home/mehdi> was used as the home directory.
</home/mehdi> was used as the working directory.
Started at Thu Feb 14 09:28:54 2013
Results reported at Thu Feb 14 09:28:56 2013
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun --mca btl self,openib ./IMB-MPI1 pingpong
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 1.25 sec.
Max Memory : 1 MB
Max Swap : 30 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
benchmarks to run pingpong
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date : Thu Feb 14 09:28:55 2013
# Machine : x86_64
# System : Linux
# Release : 2.6.32-220.el6.x86_64
# Version : #1 SMP Wed Nov 9 08:03:13 EST 2011
# MPI Version : 2.1
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# ./IMB-MPI1 pingpong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.42 0.00
1 1000 1.48 0.65
2 1000 1.29 1.48
4 1000 1.25 3.06
8 1000 1.27 5.99
16 1000 1.28 11.90
32 1000 1.30 23.46
64 1000 1.34 45.67
128 1000 1.97 61.82
256 1000 2.11 115.84
512 1000 2.29 213.60
1024 1000 2.64 370.53
2048 1000 3.46 564.81
4096 1000 4.14 943.10
8192 1000 5.65 1383.23
16384 1000 8.13 1922.95
32768 1000 11.67 2678.25
65536 640 18.59 3361.36
131072 320 32.47 3849.49
262144 160 60.17 4155.19
524288 80 115.51 4328.54
1048576 40 226.20 4420.87
2097152 20 447.60 4468.26
4194304 10 890.91 4489.79
# All processes entering MPI_Finalize
PS:
Read file <3529.err> for stderr output of this job.
[mehdi@atsplat1 ~]$
2.7.2.2 using the TCP fabric
[mehdi@atsplat1 ~]$ bsub -o%J.out -e%J.err -n 2 -R span[ptile=1] mpirun --mca btl self,tcp ./IMB-MPI1 pingpong
Job <3533> is submitted to default queue <medium_priority>.
The output:
[mehdi@atsplat1 ~]$ cat 3533.out
Sender: LSF System <hpcadmin@compute002>
Subject: Job 3533: <mpirun --mca btl self,tcp ./IMB-MPI1 pingpong> Done
Job <mpirun --mca btl self,tcp ./IMB-MPI1 pingpong> was submitted from host <atsplat1> by user <mehdi> in cluster <atsplat1_cluster1>.
Job was executed on host(s) <1*compute002>, in queue <medium_priority>, as user <mehdi> in cluster <atsplat1_cluster1>.
<1*compute005>
</home/mehdi> was used as the home directory.
</home/mehdi> was used as the working directory.
Started at Thu Feb 14 09:37:45 2013
Results reported at Thu Feb 14 09:37:58 2013
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun --mca btl self,tcp ./IMB-MPI1 pingpong
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 25.44 sec.
Max Memory : 1 MB
Max Swap : 30 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
benchmarks to run pingpong
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date : Thu Feb 14 09:37:46 2013
# Machine : x86_64
# System : Linux
# Release : 2.6.32-220.el6.x86_64
# Version : #1 SMP Wed Nov 9 08:03:13 EST 2011
# MPI Version : 2.1
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# ./IMB-MPI1 pingpong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 15.44 0.00
1 1000 15.64 0.06
2 1000 15.63 0.12
4 1000 15.67 0.24
8 1000 15.68 0.49
16 1000 15.75 0.97
32 1000 15.98 1.91
64 1000 16.89 3.61
128 1000 17.40 7.01
256 1000 18.98 12.87
512 1000 22.07 22.12
1024 1000 27.37 35.68
2048 1000 151.39 12.90
4096 1000 183.53 21.28
8192 1000 272.62 28.66
16384 1000 273.90 57.05
32768 1000 284.92 109.68
65536 640 787.86 79.33
131072 320 1054.46 118.54
262144 160 1398.88 178.71
524288 80 2559.74 195.33
1048576 40 4671.30 214.07
2097152 20 9101.65 219.74
4194304 10 17931.90 223.07
# All processes entering MPI_Finalize
PS:
Read file <3533.err> for stderr output of this job.
[mehdi@atsplat1 ~]$
2.7.2.3 using the interface eth0 only
[mehdi@atsplat1 ~]$ bsub -o%J.out -e%J.err -n 2 -R span[ptile=1] mpirun --mca btl self,tcp --mca btl_tcp_if_include eth0 ./IMB-MPI1 pingpong
Job <3535> is submitted to default queue <medium_priority>.
And the output:
[mehdi@atsplat1 ~]$ cat 3535.out
Sender: LSF System <hpcadmin@compute002>
Subject: Job 3535: <mpirun --mca btl self,tcp --mca btl_tcp_if_include eth0 ./IMB-MPI1 pingpong> Done
Job <mpirun --mca btl self,tcp --mca btl_tcp_if_include eth0 ./IMB-MPI1 pingpong> was submitted from host <atsplat1> by user <mehdi> in cluster <atsplat1_cluster1>.
Job was executed on host(s) <1*compute002>, in queue <medium_priority>, as user <mehdi> in cluster <atsplat1_cluster1>.
<1*compute005>
</home/mehdi> was used as the home directory.
</home/mehdi> was used as the working directory.
Started at Thu Feb 14 09:40:37 2013
Results reported at Thu Feb 14 09:40:58 2013
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun --mca btl self,tcp --mca btl_tcp_if_include eth0 ./IMB-MPI1 pingpong
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 40.69 sec.
Max Memory : 1 MB
Max Swap : 30 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
benchmarks to run pingpong
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date : Thu Feb 14 09:40:38 2013
# Machine : x86_64
# System : Linux
# Release : 2.6.32-220.el6.x86_64
# Version : #1 SMP Wed Nov 9 08:03:13 EST 2011
# MPI Version : 2.1
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# ./IMB-MPI1 pingpong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 24.55 0.00
1 1000 24.50 0.04
2 1000 24.52 0.08
4 1000 24.52 0.16
8 1000 24.50 0.31
16 1000 24.52 0.62
32 1000 24.57 1.24
64 1000 24.65 2.48
128 1000 24.89 4.90
256 1000 29.11 8.39
512 1000 51.64 9.45
1024 1000 47.34 20.63
2048 1000 218.24 8.95
4096 1000 240.50 16.24
8192 1000 245.51 31.82
16384 1000 324.48 48.15
32768 1000 704.06 44.39
65536 640 1338.30 46.70
131072 320 1702.87 73.41
262144 160 2638.45 94.75
524288 80 4792.39 104.33
1048576 40 9190.72 108.81
2097152 20 18132.53 110.30
4194304 10 35818.20 111.68
# All processes entering MPI_Finalize
PS:
Read file <3535.err> for stderr output of this job.
[mehdi@atsplat1 ~]$
2.7.2.4 using the interface ib1 only
[mehdi@atsplat1 ~]$ bsub -o%J.out -e%J.err -n 2 -R span[ptile=1] mpirun --mca btl self,tcp --mca btl_tcp_if_include ib1 ./IMB-MPI1 pingpong
Job <3534> is submitted to default queue <medium_priority>.
[mehdi@atsplat1 ~]$ cat 3534.out
Sender: LSF System <hpcadmin@compute002>
Subject: Job 3534: <mpirun --mca btl self,tcp --mca btl_tcp_if_include ib1 ./IMB-MPI1 pingpong> Done
Job <mpirun --mca btl self,tcp --mca btl_tcp_if_include ib1 ./IMB-MPI1 pingpong> was submitted from host <atsplat1> by user <mehdi> in cluster <atsplat1_cluster1>.
Job was executed on host(s) <1*compute002>, in queue <medium_priority>, as user <mehdi> in cluster <atsplat1_cluster1>.
<1*compute005>
</home/mehdi> was used as the home directory.
</home/mehdi> was used as the working directory.
Started at Thu Feb 14 09:38:26 2013
Results reported at Thu Feb 14 09:38:29 2013
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun --mca btl self,tcp --mca btl_tcp_if_include ib1 ./IMB-MPI1 pingpong
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 4.05 sec.
Max Memory : 1 MB
Max Swap : 30 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
benchmarks to run pingpong
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date : Thu Feb 14 09:38:27 2013
# Machine : x86_64
# System : Linux
# Release : 2.6.32-220.el6.x86_64
# Version : #1 SMP Wed Nov 9 08:03:13 EST 2011
# MPI Version : 2.1
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# ./IMB-MPI1 pingpong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 9.20 0.00
1 1000 9.43 0.10
2 1000 9.48 0.20
4 1000 9.47 0.40
8 1000 9.45 0.81
16 1000 9.45 1.62
32 1000 9.46 3.23
64 1000 9.48 6.44
128 1000 9.62 12.69
256 1000 10.35 23.58
512 1000 10.67 45.75
1024 1000 11.37 85.87
2048 1000 12.60 155.04
4096 1000 15.05 259.50
8192 1000 18.58 420.59
16384 1000 26.89 581.07
32768 1000 47.50 657.89
65536 640 95.89 651.77
131072 320 120.43 1037.97
262144 160 197.98 1262.77
524288 80 342.43 1460.15
1048576 40 657.13 1521.78
2097152 20 1275.20 1568.38
4194304 10 2520.35 1587.08
# All processes entering MPI_Finalize
PS:
Read file <3534.err> for stderr output of this job.
[mehdi@atsplat1 ~]$
Reprinted from:点击打开链接