测试Open MPI的安装
命令"ompi_info"可用来检测Open MPI的安装状态。该命令会返回一个关于你的Open MPI安装的大概信息。
Note that the ompi_info command is extremely helpful in determining which components are installed as well as listing all the run-time settable parameters that are available in each component (as well as their default values).
下列选项可能会有用:
--all
--parsable Display all the information in an easily grep/cut/awk/sed-able format.
--param
Changing the values of these parameters is explained in the "The Modular Component Architecture (MCA)" section, below.
编译Open MPI应用
Open MPI提供了可用来编译MPI应用的"wrapper"编译器:
C:
C++:
Fortran 77: mpif77
Fortran 90: mpif90
例如:
shell$ mpicc hello_world_mpi.c -o hello_world_mpi -g
All the wrapper compilers do is add a variety of compiler and linker flags to the command line and then invoke a back-end compiler. To be specific: the wrapper compilers do not parse source code at all; they are solely command-line manipulators, and have nothing to do with the actual compilation or linking of programs. The end result is an MPI executable that is properly linked to all the relevant libraries.
运行Open MPI应用
Open MPI提供了mpirun 和 mpiexec(他们是相同的)。例如:
shell$ mpirun -np 2 hello_world_mpi
和
shell$ mpiexec -np 1 hello_world_mpi : -np 1 hello_world_mpi
是相同的。某些mpiexec的switches(比如-host 和 –arch)are not yet functional,尽管在你使用它们时并不会发生错误。
The rsh launcher接受一个-hostfile参数(也可以使用选项"-machinefile",两者是等价的);你可以指定一个-hostfile参数来标志一个标准的mpirun-style hostfile(每行一个主机名):
shell$ mpirun -hostfile my_hostfile -np 2 hello_world_mpi
如果你想在一个节点上运行多个进程,那么hostfile 可以使用 "slots" 属性。如果没有指定"slots",那么将假设其数目为1。例如,使用如下的hostfile:
---------------------------------------------------------------------------
node1.example.com
node2.example.com
node3.example.com slots=2
node4.example.com slots=4
---------------------------------------------------------------------------
shell$ mpirun -hostfile my_hostfile -np 8 hello_world_mpi
will launch MPI_COMM_WORLD rank 0 on node1, rank 1 on node2, ranks 2 and 3 on node3, and ranks 4 through 7 on node4.
Other starters, such as the batch scheduling environments, do not require hostfiles (and will ignore the hostfile if it is supplied). They will also launch as many processes as slots have been allocated by the scheduler if no "-np" argument has been provided. For example, running an interactive SLURM job with 8 processors:
shell$ srun -n 8 -A
shell$ mpirun a.out
The above command will launch 8 copies of a.out in a single MPI_COMM_WORLD on the processors that were allocated by SLURM.
Note that the values of component parameters can be changed on the mpirun / mpiexec command line. This is explained in the section below, "The Modular Component Architecture (MCA)".
The Modular Component Architecture (MCA)
The MCA is the backbone of Open MPI -- most services and functionality are implemented through MCA components. Here is a list of all the component frameworks in Open MPI:
---------------------------------------------------------------------------
MPI 组件框架:
-------------------------
allocator - Memory allocator
bml
btl
coll
io
mpool
mtl
osc
pml
rcache
topo
Back-end run-time environment component frameworks:
---------------------------------------------------
errmgr
gpr
iof
ns
odls
oob
pls
ras
rds
rmaps
rmgr
rml
schema
sds
smr
Miscellaneous frameworks:
-------------------------
backtrace - Debugging call stack backtrace support
maffinity - Memory affinity
memory
memcpy
memory
paffinity - Processor affinity
timer
---------------------------------------------------------------------------
Each framework typically has one or more components that are used at run-time. For example, the btl framework is used by MPI to send bytes across underlying networks. The tcp btl, for example, sends messages across TCP-based networks; the gm btl sends messages across GM Myrinet-based networks.
Each component typically has some tunable parameters that can be changed at run-time. Use the ompi_info command to check a component to see what its tunable parameters are. For example:
shell$ ompi_info --param btl tcp
shows all the parameters (and default values) for the tcp btl component.
These values can be overridden at run-time in several ways. At run-time, the following locations are examined (in order) for new values of parameters:
1. /etc/openmpi-mca-params.conf
This file is intended to set any system-wide default MCA parameter values -- it will apply, by default, to all users who use this Open MPI installation. The default file that is installed contains many comments explaining its format.
2. $HOME/.openmpi/mca-params.conf
If this file exists, it should be in the same format as /etc/openmpi-mca-params.conf. It is intended to provide per-user default parameter values.
3. environment variables of the form OMPI_MCA_ set equal to a
Where is the name of the parameter. For example, set the variable named OMPI_MCA_btl_tcp_frag_size to the value 65536 (Bourne-style shells):
4. the mpirun command line: --mca
These locations are checked in order. For example, a parameter value passed on the mpirun command line will override an environment variable; an environment variable will override the system-wide defaults.
、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、
1.运行mpirun出现如下错误:
duanple@node1:~/project/test$ mpirun -hostfile hosts -np 2 ./a.out
bash: orted: command not found
--------------------------------------------------------------------------
A daemon (pid 27974) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished
解决方式:虽然在/etc/profile里已经设定了PATH以及LD_LIBRARY_PATH,但是这个问题好像必须要重新在~/.bashrc里重新设置PATH以及LD_LIBRARY_PATH后,才可以正常运行。
2.在运行openmpi的三个节点中,有2个32位x86节点,一个x64节点,运行时报错。
提示:reconfigure Open MPI with --enable-heterogeneous.
也就是说openmpi默认安装不支持异构节点,如果需要支持必须打开改选项,关于configure选项,参见openmpi-1.4.1下的README文件。
因此需要重新配置安装openmpi:命令如下
cd /home/duanple/mpi/openmpi;rm -rf *
cd /home/duanple/project/install/openmpi-1.4.1;./configure --prefix=/home/duanple/mpi/openmpi --enable-heterogeneous --enable-sparse-groups --enable-static
make all install
scp mpi.c client:/home/duanple/project/test;scp mpi.c node2:/home/duanple/project/test
cd /home/duanple/project/test
mpirun -hostfile hosts -np 3 ./a.out
3.使用service sshd restart,或者ifconfig 总是出现如下提示:command not found.
原因是/sbin未被加入PATH,只要修改/etc/profile或者~/.bashrc,然后source它们即可