ACSE6 L7 Parallel Communication Architectures

isFan.y

于 2021-04-14 05:20:14 发布

阅读量102

点赞数 1

分类专栏： MPI

本文链接：https://blog.csdn.net/weixin_41860751/article/details/115365392

版权

5 篇文章 0 订阅

订阅专栏

两种主要的架构类型

1. Architectures

可以完全独立执行的子任务的问题
主服务器将问题分发给所有从属节点，并等待答案返回
(2) 需要整理来自所有节点的数据的问题
信息不仅在节点之间交换，还需要合并
Might form part of an otherwise peer-to-peer architecture – see hybrid architectures

Types of communications to be used –
(1) A Master/Slave architecture will usually rely heavily on collective communications. 这会将一些通信负载分散到主节点之外
(2) Scatter/Gather or Scatter/Reduce the usual communication methods
(3) Can use non-blocking point to point if you wish to respond to individual slave nodes as they complete. 将新数据发送到节点，而无需等待所有节点完成

Every process has equal precedent and communicates directly with the other process (or, more usually, a subset of them)
Advantages/Disadvantage
(1) Advantages

非常适合强耦合的问题，尤其是在每个节点仅需要与邻居的子集进行通信的情况下
良好的可伸缩性，因为如果彼此通信的邻居数量与系统大小无关，则进入特定节点的通信量通常与所使用的进程数量无关或仅弱于所使用的进程数量
(2) Disadvantages
由于没有节点负责系统，因此通常更难编写代码
通常，没有节点会知道整个解决方案
需要对来自不同节点的结果进行后处理

Types of communications to be used –
(1) 将主要依靠非阻塞点对点通信 - MPI_Isend and MPI_Irecv
(2) 如果节点仅与其他节点的子集进行通信并且这些通信集完全互连，则是最合适的选择
(3) 有时可能需要在所有节点之间进行数据通信（例如, 使用动态时间步长时获得单个时间步长）. MPI_Allgather or MPI_Allreduce most appropriate

不需要程序忠实地遵循单一通信体系结构
An example of where a hybrid architecture is appropriate might be domain decomposition with dynamic load balancing

评估并行代码的性能的两个指标
（1） Speedup ratio
相对于串行代码，并行执行代码要快多少倍
$S=\frac{T_1}{T_N}$
(2) Parallel Efficiency
代码相对于理想加速的速度有多快
$E=\frac{T_1}{NT_N}=\frac SN$
Parallel efficiency通常会随着使用的内核数量的增加而下降. 可能具有超线性加速（效率大于一），但很少见，通常是由较小的任务和更有效的缓存使用引起的

在分布式内存代码中，大多数计算是并行的. Inefficiency often comes from the relative amount of time spent transferring data (or waiting for communications) relative to the amount of time spend doing calculations.
$T_{total}=T_{calculate}+T_{communicate}$
如果我们假设问题的大小为P，并且可以完美分解问题，则
$T_{calculate}\in\frac PN$
The communications are often associated with the “edges” of the data
$T_{communicate}\in(\frac PN)^n$
其中n通常介于零和一之间，并且通常取决于数据的维数
我们可以将它们结合起来，以估计随着问题大小和所用内核数的变化，speedup and parallel efficiency的预期变化, 请注意，这些只是近似值，但可用于了解预期趋势
$E\approx\frac1{1+kP^{n-1}N^{1-n}}$
其中k是特定于问题的，并且与通信和计算的相对成本有关
(2) 假设0 <n<1，则该等式意味着：
对于给定的问题大小P，效率随着N的增加而下降. 这也意味着在使用相同数量的内核时，更大的问题将具有更高的效率

与并行执行数据相比，并行执行任务实际上总是更有效率的. (1) 例如如果您要完成10个大型仿真，则可以使用并行代码来进行仿真，并且可以使用100个内核, 有两个选择

Carry out each of the simulations on 100 cores, doing this for each of the 10 simulations
Carry out all 10 simulations at the same time, each using 10 cores.
(2) 第二个选项通常是最佳选择
随着核心数量的增加，执行单个仿真的并行效率下降
The exception to this heuristic will be when some of the simulations to be carried out are computationally much more expensive than others

关注