python mpi4py_mpi4py 快速上手

在上一篇中我们介绍了如何安装和使用 mpi4py,下面我们以几个简单的例子来展示怎么使用 mpi4py 来进行并行编程,以使读者能够快速地上手使用 mpi4py。这些例子来自 mpi4py 的 Document,有些做了一些适当的改动。

点到点通信

传递通用的 Python 对象(阻塞方式)

这种方式非常简单易用,适用于任何可被 pickle 系列化的 Python 对象,但是在发送和接收端的 pickle 和 unpickle 操作却并不高效,特别是在传递大量的数据时。另外阻塞式的通信在消息传递时会阻塞进程的执行。

# p2p_blocking.py

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

data = {'a': 7, 'b': 3.14}

print 'process %d sends %s' % (rank, data)

comm.send(data, dest=1, tag=11)

elif rank == 1:

data = comm.recv(source=0, tag=11)

print 'process %d receives %s' % (rank, data)

运行结果如下:

$ mpiexec -n 2 python p2p_blocking.py

process 0 sends {'a': 7, 'b': 3.14}

process 1 receives {'a': 7, 'b': 3.14}

传递通用的 Python 对象(非阻塞方式)

这种方式非常简单易用,适用于任何可被 pickle 系列化的 Python 对象,但是在发送和接收端的 pickle 和 unpickle 操作却并不高效,特别是在传递大量的数据时。非阻塞式的通信可以将通信和计算进行重叠从而大大改善性能。

# p2p_non_blocking.py

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

data = {'a': 7, 'b': 3.14}

print 'process %d sends %s' % (rank, data)

req = comm.isend(data, dest=1, tag=11)

req.wait()

elif rank == 1:

req = comm.irecv(source=0, tag=11)

data = req.wait()

print 'process %d receives %s' % (rank, data)

运行结果如下:

$ mpiexec -n 2 python p2p_non_blocking.py

process 0 sends {'a': 7, 'b': 3.14}

process 1 receives {'a': 7, 'b': 3.14}

传递 numpy 数组(高效快速的方式)

对类似于数组这样的数据,准确来说是具有单段缓冲区接口(single-segment buffer interface)的 Python 对象,如 numpy 数组及内置的 bytes/string/array 等,可以用一种更为高效的方式直接进行传递,而不需要经过 pickle 系列化和恢复。以这种方式传递数据需要使用通信子对象的以大写字母开头的方法,如 Send(),Recv(),Bcast(),Scatter(),Gather() 等。

# p2p_numpy_array.py

import numpy

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

# passing MPI datatypes explicitly

if rank == 0:

data = numpy.arange(10, dtype='i')

print 'process %d sends %s' % (rank, data)

comm.Send([data, MPI.INT], dest=1, tag=77)

elif rank == 1:

data = numpy.empty(10, dtype='i')

comm.Recv([data, MPI.INT], source=0, tag=77)

print 'process %d receives %s' % (rank, data)

# automatic MPI datatype discovery

if rank == 0:

data = numpy.arange(10, dtype=numpy.float64)

print 'process %d sends %s' % (rank, data)

comm.Send(data, dest=1, tag=13)

elif rank == 1:

data = numpy.empty(10, dtype=numpy.float64)

comm.Recv(data, source=0, tag=13)

print 'process %d receives %s' % (rank, data)

运行结果如下:

$ mpiexec -n 2 python p2p_numpy_array.py

process 0 sends [0 1 2 3 4 5 6 7 8 9]

process 1 receives [0 1 2 3 4 5 6 7 8 9]

process 0 sends [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

process 1 receives [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

集合通信

广播(Broadcast)

广播操作将根进程的数据复制到同组内其他所有进程中。

广播通用的 Python 对象

# bcast.py

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

data = {'key1' : [7, 2.72, 2+3j],

'key2' : ( 'abc', 'xyz')}

print 'before broadcasting: process %d has %s' % (rank, data)

else:

data = None

print 'before broadcasting: process %d has %s' % (rank, data)

data = comm.bcast(data, root=0)

print 'after broadcasting: process %d has %s' % (rank, data)

运行结果如下:

$ mpiexec -n 2 python bcast.py

before broadcasting: process 0 has {'key2': ('abc', 'xyz'), 'key1': [7, 2.72, (2+3j)]}

after broadcasting: process 0 has {'key2': ('abc', 'xyz'), 'key1': [7, 2.72, (2+3j)]}

before broadcasting: process 1 has None

after broadcasting: process 1 has {'key2': ('abc', 'xyz'), 'key1': [7, 2.72, (2+3j)]}

广播 numpy 数组

# Bcast.py

import numpy as np

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

data = np.arange(10, dtype='i')

print 'before broadcasting: process %d has %s' % (rank, data)

else:

data = np.zeros(10, dtype='i')

print 'before broadcasting: process %d has %s' % (rank, data)

comm.Bcast(data, root=0)

print 'after broadcasting: process %d has %s' % (rank, data)

运行结果如下:

$ mpiexec -n 2 python Bcast.py

before broadcasting: process 0 has [0 1 2 3 4 5 6 7 8 9]

after broadcasting: process 0 has [0 1 2 3 4 5 6 7 8 9]

before broadcasting: process 1 has [0 0 0 0 0 0 0 0 0 0]

after broadcasting: process 1 has [0 1 2 3 4 5 6 7 8 9]

发散(Scatter)

发散操作从组内的根进程分别向组内其它进程散发不同的消息。

发散通用的 Python 对象

# scatter.py

from mpi4py import MPI

comm = MPI.COMM_WORLD

size = comm.Get_size()

rank = comm.Get_rank()

if rank == 0:

data = [ (i + 1)**2 for i in range(size) ]

print 'before scattering: process %d has %s' % (rank, data)

else:

data = None

print 'before scattering: process %d has %s' % (rank, data)

data = comm.scatter(data, root=0)

print 'after scattering: process %d has %s' % (rank, data)

运行结果如下:

$ mpiexec -n 3 python scatter.py

before scattering: process 0 has [1, 4, 9]

after scattering: process 0 has 1

before scattering: process 1 has None

after scattering: process 1 has 4

before scattering: process 2 has None

after scattering: process 2 has 9

发散 numpy 数组

# Scatter.py

import numpy as np

from mpi4py import MPI

comm = MPI.COMM_WORLD

size = comm.Get_size()

rank = comm.Get_rank()

sendbuf = None

if rank == 0:

sendbuf = np.empty([size, 10], dtype='i')

sendbuf.T[:, :] = range(size)

print 'before scattering: process %d has %s' % (rank, sendbuf)

recvbuf = np.empty(10, dtype='i')

comm.Scatter(sendbuf, recvbuf, root=0)

print 'after scattering: process %d has %s' % (rank, recvbuf)

运行结果如下:

$ mpiexec -n 3 python Scatter.py

before scattering: process 0 has [[0 0 0 0 0 0 0 0 0 0]

[1 1 1 1 1 1 1 1 1 1]

[2 2 2 2 2 2 2 2 2 2]]

before scattering: process 1 has None

before scattering: process 2 has None

after scattering: process 0 has [0 0 0 0 0 0 0 0 0 0]

after scattering: process 2 has [2 2 2 2 2 2 2 2 2 2]

after scattering: process 1 has [1 1 1 1 1 1 1 1 1 1]

收集(Gather)

收集操作是发散的逆操作,根进程从其它进程收集不同的消息依次放入自己的接收缓冲区内。

收集通用的 Python 对象

# gather.py

from mpi4py import MPI

comm = MPI.COMM_WORLD

size = comm.Get_size()

rank = comm.Get_rank()

data = (rank + 1)**2

print 'before gathering: process %d has %s' % (rank, data)

data = comm.gather(data, root=0)

print 'after scattering: process %d has %s' % (rank, data)

运行结果如下:

$ mpiexec -n 3 python gather.py

before gathering: process 0 has 1

after scattering: process 0 has [1, 4, 9]

before gathering: process 1 has 4

after scattering: process 1 has None

before gathering: process 2 has 9

after scattering: process 2 has None

收集 numpy 数组

# Gather.py

import numpy as np

from mpi4py import MPI

comm = MPI.COMM_WORLD

size = comm.Get_size()

rank = comm.Get_rank()

sendbuf = np.zeros(10, dtype='i') + rank

print 'before gathering: process %d has %s' % (rank, sendbuf)

recvbuf = None

if rank == 0:

recvbuf = np.empty([size, 10], dtype='i')

comm.Gather(sendbuf, recvbuf, root=0)

print 'after gathering: process %d has %s' % (rank, recvbuf)

运行结果如下:

$ mpiexec -n 3 python Gather.py

before gathering: process 0 has [0 0 0 0 0 0 0 0 0 0]

after gathering: process 0 has [[0 0 0 0 0 0 0 0 0 0]

[1 1 1 1 1 1 1 1 1 1]

[2 2 2 2 2 2 2 2 2 2]]

before gathering: process 1 has [1 1 1 1 1 1 1 1 1 1]

after gathering: process 1 has None

before gathering: process 2 has [2 2 2 2 2 2 2 2 2 2]

after gathering: process 2 has None

最后让我们比较一下以小写字母开头的 send()/recv() 方法与以大写字母开头的 Send()/Recv() 方法在传递 numpy 数组时的性能差异。

比较 send()/recv() 和 Send()/Recv()

# send_recv_timing.pu

import time

import numpy as np

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

data = np.random.randn(10000).astype(np.float64)

else:

data = np.empty(10000, dtype=np.float64)

comm.barrier()

# use comm.send() and comm.recv()

t1 = time.time()

if rank == 0:

comm.send(data, dest=1, tag=1)

else:

comm.recv(source=0, tag=1)

t2 = time.time()

if rank == 0:

print 'time used by send/recv: %f seconds' % (t2 - t1)

comm.barrier()

# use comm.Send() and comm.Recv()

t1 = time.time()

if rank == 0:

comm.Send(data, dest=1, tag=2)

else:

comm.Recv(data, source=0, tag=2)

t2 = time.time()

if rank == 0:

print 'time used by Send/Recv: %f seconds' % (t2 - t1)

运行结果如下:

$ mpiexec -n 2 python send_recv_timing.py

time used by send/recv: 0.000412 seconds

time used by Send/Recv: 0.000091 seconds

可以看出在代码几乎一样的情况下,以大写字母开头的 Send()/Recv() 方法对 numpy 数组的传递效率要高的多,因此在涉及 numpy 数组的并行操作时,应尽量选择以大写字母开头的通信方法。

以上通过几个简单的例子介绍了怎么在 Python 中利用 mpi4py 进行并行编程,可以看出 mpi4py 使得在 Python 中进行 MPI 并行编程非常容易,也比在 C、C++、Fortran 中调用 MPI 的应用接口进行并行编程要方便和灵活的多,特别是 mpi4py 提供的基于 pickle 的通用 Python 对象传递机制,使我们在编程过程中完全不用考虑所传递的数据类型和数据长度。这种灵活性和易用性虽然会有一些性能上的损失,但是在传递的数据量不大的情况下,这种性能损失是可以忽略的。当需要传递大量的数组类型的数据时,mpi4py 提供的以大写字母开头的通信方法使得数据可以以接近 C、C++、Fortran 的速度在不同的进程间高效地传递。对 numpy 数组,这种高效性却并不损失或很少损失其灵活性和易用性,因为 mpi4py 可以自动推断出 numpy 数组的类型及数据长度信息,因此一般情况下不用显式的指定。这给我们利用 numpy 的数组进行高性能的并行计算编程带来莫大的方便。

在后面我们将详细地介绍 mpi4py 所提供的各种方法及其具体的用法。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值