NS3 - mpirun - MPI for Distributed Simulation

最新推荐文章于 2023-07-14 19:38:30 发布

ztguang

最新推荐文章于 2023-07-14 19:38:30 发布

阅读量1.7k

点赞数

分类专栏： SDN/虚拟化/NS3/Docker

SDN/虚拟化/NS3/Docker 专栏收录该内容

345 篇文章 4 订阅

订阅专栏

https://www.nsnam.org/docs/models/html/distributed.html

MPI for Distributed Simulation

Parallel and distributed discrete event simulation allows the execution of asingle simulation program on multiple processors. By splitting up the simulationinto logical processes, LPs, each LP can be executed by a different processor.This simulation methodology enables very large-scale simulations by leveragingincreased processing power and memory availability. In order to ensure properexecution of a distributed simulation, message passing between LPs is required.To support distributed simulation in ns-3, the standard Message PassingInterface (MPI) is used, along with a new distributed simulator class.Currently, dividing a simulation for distributed purposes in ns-3 can only occuracross point-to-point links.

Current Implementation Details

During the course of a distributed simulation, many packets must cross simulatorboundaries. In other words, a packet that originated on one LP is destined for adifferent LP, and in order to make this transition, a message containing thepacket contents must be sent to the remote LP. Upon receiving this message, theremote LP can rebuild the packet and proceed as normal. The process of sendingan receiving messages between LPs is handled easily by the new MPI interface inns-3.

Along with simple message passing between LPs, a distributed simulator is usedon each LP to determine which events to process. It is important to processevents in time-stamped order to ensure proper simulation execution. If a LPreceives a message containing an event from the past, clearly this is an issue,since this event could change other events which have already been executed. Toaddress this problem, two conservative synchronization algorithm with lookahead areused in ns-3. For more information on different synchronization approaches andparallel and distributed simulation in general, please refer to “Parallel andDistributed Simulation Systems” by Richard Fujimoto.

The default parallel synchronization strategy implemented in theDistributedSimulatorImpl class is based on a globally synchronizedalgorithm using an MPI collective operation to synchronize simulationtime across all LPs. A second synchronization strategy based on localcommunication and null messages is implemented in theNullMessageSimulatorImpl class, For the null message strategy theglobal all to all gather is not required; LPs only need tocommunication with LPs that have shared point-to-point links. Thealgorithm to use is controlled by which the ns-3 global valueSimulatorImplementationType.

The best algorithm to use is dependent on the communication and eventscheduling pattern for the application. In general, null messagesynchronization algorithms will scale better due to localcommunication scaling better than a global all-to-all gather that isrequired by DistributedSimulatorImpl. There are two known cases wherethe global synchronization performs better. The first is when mostLPs have point-to-point link with most other LPs, in other words theLPs are nearly fully connected. In this case the null messagealgorithm will generate more message passing traffic than theall-to-all gather. A second case where the global all-to-all gatheris more efficient is when there are long periods of simulation timewhen no events are occurring. The all-to-all gather algorithm is ableto quickly determine then next event time globally. The nearestneighbor behavior of the null message algorithm will require morecommunications to propagate that knowledge; each LP is only aware ofneighbor next event times.

Remote point-to-point links

As described in the introduction, dividing a simulation for distributed purposesin ns-3 currently can only occur across point-to-point links; therefore, theidea of remote point-to-point links is very important for distributed simulationin ns-3. When a point-to-point link is installed, connecting two nodes, thepoint-to-point helper checks the system id, or rank, of both nodes. The rankshould be assigned during node creation for distributed simulation and isintended to signify on which LP a node belongs. If the two nodes are on the samerank, a regular point-to-point link is created. If, however, the two nodes areon different ranks, then these nodes are intended for different LPs, and aremote point-to-point link is used. If a packet is to be sent across a remotepoint-to-point link, MPI is used to send the message to the remote LP.

Distributing the topology

Currently, the full topology is created on each rank, regardless of theindividual node system ids. Only the applications are specific to a rank. Forexample, consider node 1 on LP 1 and node 2 on LP 2, with a traffic generator onnode 1. Both node 1 and node 2 will be created on both LP1 and LP2; however, thetraffic generator will only be installed on LP1. While this is not optimal formemory efficiency, it does simplify routing, since all current routingimplementations in ns-3 will work with distributed simulation.

Running Distributed Simulations

Prerequisites

Ensure that MPI is installed, as well as mpic++. In Ubuntu repositories,these are openmpi-bin, openmpi-common, openmpi-doc, libopenmpi-dev. InFedora, these are openmpi and openmpi-devel.

Note:

There is a conflict on some Fedora systems between libotf and openmpi. Apossible “quick-fix” is to yum remove libotf before installing openmpi.This will remove conflict, but it will also remove emacs. Alternatively,these steps could be followed to resolve the conflict:

Rename the tiny otfdump which emacs says it needs:
$ mv /usr/bin/otfdump /usr/bin/otfdump.emacs-version
Manually resolve openmpi dependencies:
$ sudo yum install libgfortran libtorque numactl
Download rpm packages:
openmpi-1.3.1-1.fc11.i586.rpm
openmpi-devel-1.3.1-1.fc11.i586.rpm
openmpi-libs-1.3.1-1.fc11.i586.rpm
openmpi-vt-1.3.1-1.fc11.i586.rpm
from http://mirrors.kernel.org/fedora/releases/11/Everything/i386/os/Packages/
Force the packages in:
$ sudo rpm -ivh --force \
openmpi-1.3.1-1.fc11.i586.rpm \
openmpi-libs-1.3.1-1.fc11.i586.rpm \
openmpi-devel-1.3.1-1.fc11.i586.rpm \
openmpi-vt-1.3.1-1.fc11.i586.rpm

Also, it may be necessary to add the openmpi bin directory to PATH in order toexecute mpic++ and mpirun from the command line. Alternatively, the full path tothese executables can be used. Finally, if openmpi complains about the inabilityto open shared libraries, such as libmpi_cxx.so.0, it may be necessary to addthe openmpi lib directory to LD_LIBRARY_PATH.

Here is an example of setting up PATH and LD_LIBRARY_PATH using a bash shell:

For a 32-bit Linux distribution:

 
            $ export PATH=$PATH:/usr/lib/openmpi/bin
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib

For a 64-bit Linux distribution:

 
            $ export PATH=$PATH:/usr/lib64/openmpi/bin
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib

These lines can be added into ~/.bash_profile or ~/.bashrc to avoid having toretype them when a new shell is opened.

Building and Running Examples

If you already built ns-3 without MPI enabled, you must re-build:

$ ./waf distclean

Configure ns-3 with the –enable-mpi option:

$ ./waf -d debug configure --enable-examples --enable-tests --enable-mpi

Ensure that MPI is enabled by checking the optional features shown from theoutput of configure.

Next, build ns-3:

$ ./waf

After building ns-3 with mpi enabled, the example programs are nowready to run with mpirun. Here are a few examples (from the root ns-3directory):

 
        $ mpirun -np 2 ./waf --run simple-distributed
$ mpirun -np 4 -machinefile mpihosts ./waf --run 'nms-udp-nix --LAN=2 --CN=4 --nix=1'

An examle using the null message synchronization algorithm:

$ mpirun -np 2 ./waf --run simple-distributed --nullmsg

The np switch is the number of logical processors to use. The machinefile switchis which machines to use. In order to use machinefile, the target file mustexist (in this case mpihosts). This can simply contain something like:

 
        localhost
localhost
localhost
...

Or if you have a cluster of machines, you can name them.

NOTE: Some users have experienced issues using mpirun and waf together. Analternative way to run distributed examples is shown below:

 
        $ ./waf shell
$ cd build/debug
$ mpirun -np 2 src/mpi/examples/simple-distributed

Setting synchronization algorithm to use

The global value SimulatorImplementationType is used to set thesynchronization algorithm to use. This value must be set before theMpiInterface::Enable method is invoked if the defaultDistributedSimulatorImpl is not used. Here is an example code snippetshowing how to add a command line argument to control thesynchronization algorithm choice::

 
        cmd.AddValue ("nullmsg", "Enable the use of null-message synchronization", nullmsg);
if(nullmsg)
  {
    GlobalValue::Bind ("SimulatorImplementationType",
                       StringValue ("ns3::NullMessageSimulatorImpl"));
  }
else
  {
    GlobalValue::Bind ("SimulatorImplementationType",
                       StringValue ("ns3::DistributedSimulatorImpl"));
  }

// Enable parallel simulator with the command line arguments
MpiInterface::Enable (&argc, &argv);
 
       

Creating custom topologies

The example programs in src/mpi/examples give a good idea of how to create differenttopologies for distributed simulation. The main points are assigning system idsto individual nodes, creating point-to-point links where the simulation shouldbe divided, and installing applications only on the LP associated with thetarget node.

Assigning system ids to nodes is simple and can be handled two different ways.First, a NodeContainer can be used to create the nodes and assign system ids:

 
        NodeContainer nodes;
nodes.Create (5, 1); // Creates 5 nodes with system id 1.

Alternatively, nodes can be created individually, assigned system ids, and addedto a NodeContainer. This is useful if a NodeContainer holds nodes with differentsystem ids:

 
        NodeContainer nodes;
Ptr<Node> node1 = CreateObject<Node> (0); // Create node1 with system id 0
Ptr<Node> node2 = CreateObject<Node> (1); // Create node2 with system id 1
nodes.Add (node1);
nodes.Add (node2);
 
       

Next, where the simulation is divided is determined by the placement ofpoint-to-point links. If a point-to-point link is created between twonodes with different system ids, a remote point-to-point link is created,as described in Current Implementation Details.

Finally, installing applications only on the LP associated with the target nodeis very important. For example, if a traffic generator is to be placed on node0, which is on LP0, only LP0 should install this application. This is easilyaccomplished by first checking the simulator system id, and ensuring that itmatches the system id of the target node before installing the application.

Tracing During Distributed Simulations

Depending on the system id (rank) of the simulator, the information traced willbe different, since traffic originating on one simulator is not seen by anothersimulator until it reaches nodes specific to that simulator. The easiest way tokeep track of different traces is to just name the trace files or pcapsdifferently, based on the system id of the simulator. For example, somethinglike this should work well, assuming all of these local variables werepreviously defined:

 
       if (MpiInterface::GetSystemId () == 0)
  {
    pointToPoint.EnablePcapAll ("distributed-rank0");
    phy.EnablePcap ("distributed-rank0", apDevices.Get (0));
    csma.EnablePcap ("distributed-rank0", csmaDevices.Get (0), true);
  }
else if (MpiInterface::GetSystemId () == 1)
  {
    pointToPoint.EnablePcapAll ("distributed-rank1");
    phy.EnablePcap ("distributed-rank1", apDevices.Get (0));
    csma.EnablePcap ("distributed-rank1", csmaDevices.Get (0), true);
  }