性能分析(Performance analysis)
General speedup formula(一般加速比公式)
S p e e d u p = T s e r i a l T p a r a l l e l Speedup=\frac{T_{serial}}{T_{parallel}} Speedup=TparallelTserial
-
Linear Speedup: S p = p S_p=p Sp=p
-
Superlinear Speedup: S p > p S_p>p Sp>p
-
Usually we get the speedup S p < p S_p<p Sp<p,because parallel programs have overheads.
need extra computation in the parallel vision Communication time between processes
execution time components(执行时间的组成)
Inherently sequential computations(固有的串行部分):
σ
(
n
)
\sigma(n)
σ(n)
Potentially parallel computations(可以并行的部分):
φ
(
n
)
\varphi (n)
φ(n)
Communication operations(并行部分的通信):
κ
(
n
,
p
)
\kappa (n,p)
κ(n,p)
加速比:
Efficiency(效率)
E = S p e e d u p p E=\frac{Speedup}{p} E=pSpeedup
Amdahl’s Law
阿曼达定律说的是,如果一个程序包括并行和串行,随着机器数量增加,并行执行时间会越来越短,最后趋向于0,串行的时间没有变,这就是加速比。(计算的时候不考虑开销)
假设固有串行的占比
f
=
σ
(
n
)
(
σ
(
n
)
+
φ
(
n
)
)
f =\frac{\sigma(n)}{(\sigma(n) + \varphi(n))}
f=(σ(n)+φ(n))σ(n)
0
⩽
f
⩽
1
0\leqslant f \leqslant 1
0⩽f⩽1
加速比:
算出来的为p个处理器上的最大并行加速比,实际加速比不会超过amdalh‘s Law 算出来的加速比。
例题:
- 95% of a program’s execution time occurs inside a loop that can be executed in parallel. What is the maximum speedup we should expect from a parallel version of the program executing on 8 CPUs?
解析:maximum speedup:amdalh‘s law
95% parallel -> f=0.05
修改:如果删除题目中 of the program executing on 8 CPUs? 即p趋向于无穷。
加速比 ψ ⩽ 1 0.05 = 20 \psi \leqslant \frac{1}{0.05}=20 ψ⩽0.051=20 - 20% of a program’s execution time is spent within inherently sequential code. What is the limit to the speedup achievable by a parallel version of the program?
解析:limit to the speedup 极限加速比 用amdalh’s law
固有的串行比例越少,性能越好(大概这样吧)
Limitations of Amdahl’s Law
Ignores
κ
(
n
,
p
)
\kappa (n,p)
κ(n,p)
Overestimates speedup achievable
Amdahl Effect
Typically
κ
(
n
,
p
)
\kappa (n,p)
κ(n,p) has lower complexity than
φ
(
n
)
\varphi (n)
φ(n)/p 。通常,
κ
(
n
,
p
)
\kappa (n,p)
κ(n,p)的复杂度低于
φ
(
n
)
\varphi (n)
φ(n)/ p
As n increases,
φ
(
n
)
\varphi (n)
φ(n)/p dominates
κ
(
n
,
p
)
\kappa (n,p)
κ(n,p)。随着n的增加,
φ
(
n
)
\varphi (n)
φ(n)/ p占
κ
(
n
,
p
)
\kappa (n,p)
κ(n,p)的主导
As n increases, speedup increases。随着n增加,加速增加
单纯地增加cup处理器的数量并不一定可以有效地提高系统的性能,只有在提高系统内并行化模块比重的前提下,同时合理增加处理器的数量,才能以最小的投入得到最大的加速比
Gustafson-Barsis’s Law(古斯塔夫森定律)
如果将时间作为常数,则问题的规模将随着处理器数量的增加而增加,也就是说,内部串行部件在计算中所占的比例将减少。 因此,加速比将增加。
为解决上述问题,Gustafson定律也是说明处理器个数、串行比例和加速比之前的关系.
如何区分用哪个定律:如果时间锁定,看问题规模——Gustafson-Barsis’s Law
加速比公式:
令s =
σ
(
n
)
\sigma(n)
σ(n)/(
σ
(
n
)
\sigma(n)
σ(n)+
φ
(
n
)
\varphi (n)
φ(n)/p)(串行部分在真正并行情况下所耗费的时间)
In computer architecture, Gustafson’s Law gives the theoretical speedup in latency of the execution of a task at fixed execution time that can be expected of a system whose resources are improved.
Problem size is an increasing function of p
Predicts scaled speedup(可扩展的加速比)
例题:
-
An application running on 10 processors spends 3% of its time in serial code. What is the scaled speedup of the application?
s目前题目中没用再需要计算。
-
What is the maximum fraction of a program’s parallel execution time that can be spent in serial code if it is to achieve a scaled speedup of 7 on 8 processors?
A parallel program executing on 32 processors spends 5% of its time in sequential code. What is the scaled speedup of this program?
sequential code:串行部分 s=0.05
ψ = 32 + ( 1 − 32 ) s = 30.45 \psi =32+(1-32)s=30.45 ψ=32+(1−32)s=30.45