BSP模型

最新推荐文章于 2021-01-04 09:56:23 发布

weixin_41678425

最新推荐文章于 2021-01-04 09:56:23 发布

阅读量637

点赞数

原文链接：https://blog.csdn.net/baimafujinji/article/details/51208900

版权

计算模型

所谓计算模型实际上就是硬件和软件之间的一种桥梁，我们可以借助它来设计分析算法，在其上髙级语言能被有效地编译且能够用硬件来实现。在串行计算时，冯•诺依曼机就是一个理想的串行计算模型，在此模型上硬件设计者可设计多种多样的冯•诺依曼机而无须考虑那些将要被执行的软件；另一方面，软件工程师也能够编写各种可在此模型上有效执行的程序而无须考虑所使用的硬件。

不幸的是，在并行计算时，尚未有一个类似于冯•诺依曼机的真正通用的并行计算模型。现在流行的计算模型要么过于简单、抽象（如PRAM）；要么过于专用（如互连网络模型和VLSI计算模型）。因而急需发展一种更为实用、能够较真实反映现代并行机性能的并行计算模型。我们在之前的文章中已经讨论过PRAM模型。读者可以参考我的博客文章《PRAM模型与Amdahl定律》

简而言之，PRAM模型，即并行随机存取机器，也称之为共享存储的SIMD模型，是一种抽象的并行计算模型。在这种模型中，假定存在着一个容量无限大的共享存储器；同时存在有限（或无限）个功能相同的处理器，且其均具有简单的算术运算和逻辑判断功能；在任何时刻各处理器均可通过共享存储单元相互交换数据。根据处理器对共享存储单元是否可以同时读、同时写的限制， PRAM模型又可分为：EREW、CREW、CRCW等几种类型。

下面本文将介绍另外一种并行计算模型——BSP模型。

BSP模型

BSP（Bulk Synchronous Parallel）模型，字面的含义是 “大”同步模型，它最早由Leslie和Valiant 在 1990 年提出。作为计算机语言和体系结构之间的桥梁，BSP使用下面三个参数（或属性）来描述的分布存储的多处理器模型：

处理器/储器模块: A BSP abstract machine consists of a collection of p abstract processors, each with local memory, connected by an interconnection network.
执行以时间间隔L为周期的所谓路障同步器：the time to do a barrier synchronization.
施行处理器/储器模块对之间点到点传递消息的选路器: the rate at which continuous randomly addressed data can be delivered

所以BSP模型将并行机的特性抽象为三个定量参数p、g、L,分别对应于处理器数、选路器吞吐率(亦称带宽因子）、全局同步之间的时间间隔。

BSP模型中的计算行为：在BSP模型中，计算过程是由一系列用全局同步分开的周期为L的超级步（supersteps)所组成的。A (abstract) program consists of p周期被分配给未曾完成的超级步。
每个superstep都包含：

a computation where each processor (executing the threads assigned to it) uses only locally held values;
a global message transmission from each processor to any subset of the others;
a barrier synchronization.

在superstep结束时，the transmitted messages become available as
local data for the next superstep。下图是BSP里一个superstep中的计算模式示意图：

BSP模型的性质和特点：BSP模型是个分布存储的MIMD计算模型，其特点是:

它将处理器和选路器分开，强调了计算任务和通信任务的分开，而选路器仅施行点到点的消息传递，不提供组合、复制或广播等功能，这样做既掩盖了具体的互连网络拓扑，又简化了通信协议; With the program divided into supersteps it is easier to provide performance guarantees than with unregulated message-passing systems. Because communication all happens together at the end of the computation phase of the superstep, it is possible to perform automatic optimisation of the communications pattern. This is particularly important on machines where the start-up cost of a communication is high: if during a superstep processor i sends two messages to processor j , then it will often be quicker to bundle the messages together and send the bundle from
i to j than it would be to send each message separately. Similarly, the communication pattern can be reshuffled to avoid network congestion, and intelligent routing techniques can be used to detect and avoid hot spots。
釆用路障方式的以硬件实现的全局步是在可控的粗粒度级，从而提供了执行紧耦合同步式并行算法的有效方式，而编程开发人员并无过分的负担，BSP model eliminates the need for programmers to manage memory, assign communication and perform low-level synchronization. Threads of the program are assigned (typically in a randomized way) by the machine to the processors.;
在分析BSP模型的性能时，假定局部操作可在一个时间步内完成，而在每一个superstep中，一个处理器至多发送或接收 h 条消息（称为h-relation)。假定 s 的时间也可达到最佳模拟。

BSP成本分析（Computational analysis）：Consider a BSP program consisting of S supersteps. Then, the execution time for superstep i is