Computer composition and design work01（1.6-1.9）——fifth verson_and show the relative speedup of the 2, 4, and 8 p-CSDN博客

本文链接：https://blog.csdn.net/JamSlade/article/details/123262359

本文探讨了不同处理器性能指标（如CPI和时钟速率）对程序执行效率的影响，比较了不同编译器对执行时间和CPI的贡献，并分析了电力消耗与技术改进的关系。通过实例计算展示了并行化对性能的影响和CPI变化对执行时间的效应。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.6

[20] <§1.6> Consider two diff erent implementations of the same instruction
set architecture. Th e instructions can be divided into four classes according to
their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3,
and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.

Given a program with a dynamic instruction count of 1.0E6 instructions divided
into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D,
which implementation is faster?

a. What is the global CPI for each implementation?

b. Find the clock cycles required in both cases.

$CPI_1 = 1 \times 10\% + 2\times20\%+3\times 50\%+3\times20\%=2.6\\CPI_2 = 2 \times 10\% + 2\times20\%+2\times 50\%+2\times20\%=2$

$clock\ cycle = CPI\times Instructions\ for\ a\ program\\ CPU时钟周期数 = 程序指令数 \times 指令平均周期数$

$T_1 = 2.6\times 1.0\times 10^6 = 2.6\times 10^6\\ T_2 = 2\times 1.0\times 10^6 = 2\times 10^6$

1.7

[15] <§1.6> Compilers can have a profound impact on the performance
of an application. Assume that for a program, compiler A results in a dynamic
instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B
results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s.

a. Find the average CPI for each program given that the processor has a clock cycle
time of 1 ns.

$\frac{CPU时钟周期数}{程序指令数} = \frac{CPU执行时间}{时钟周期长度\times 程序指令数}$

$CPI_A=\frac{1.1}{(1.0\times10^{-9})(1.0\times 10^{9})} = 1.1$
$CPI_B=\frac{1.5}{(1.0\times10^{-9})(1.2\times 10^{9})} = 1.25$

b. Assume the compiled programs run on two diff erent processors. If the execution
times on the two processors are the same, how much faster is the clock of the
processor running compiler A’s code versus the clock of the processor running
compiler B’s code?

$\frac{TIME_B}{TIME_A}=\frac{1.5}{1.1} = 1.363$

c. A new compiler is developed that uses only 6.0E8 instructions and has an
average CPI of 1.1. What is the speedup of using this new compiler versus using
compiler A or B on the original processor?
$_A = \frac{Time_A}{Time} = \frac{1.1\times 10^9}{1.1\times 6\times10^8} = 1.67$
$faster_B = \frac{1.25\times1.2\times 10^9}{1.1\times 6\times10^8}=2.27$

1.8

1.8 Th e Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6
GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of static
power and 90 W of dynamic power.
Th e Core i5 Ivy Bridge, released in 2012, had a clock rate of 3.4 GHz and voltage
of 0.9 V. Assume that, on average, it consumed 30 W of static power and 40 W of
dynamic power.

1.8.1 [5] <§1.7> For each processor find the average capacitive loads.
$\frac{2\times 动态能耗}{电压^2\times开关频率}$
Petium 4 Prescott
$C_1 = \frac{2\times 90}{1.25^2\times3.6\times10^9}=3.2\times 10^{-8}$
Core i5 Ivy Bridge
$C_2 = \frac{2\times 40}{0.9^2\times 3.4\times10^9} = 2.9\times 10^{-8}$
1.8.2 [5] <§1.7> Find the percentage of the total dissipated power comprised by
static power and the ratio of static power to dynamic power for each technology.

Petium 4 Prescott
$\frac{10}{10+90} = 10\%\quad \frac{10}{90} = 11.1\%$
Core i5 Ivy Bridge
$\frac{30}{30+40} = 42.9\% \quad \frac{30}{40} = 75\%$
1.8.3 [15] <§1.7> If the total dissipated power is to be reduced by 10%, how much
should the voltage be reduced to maintain the same leakage current? Note: power
is defi ned as the product of voltage and current.
$\frac{整体耗散_{new}}{整体耗散_{old}}=\frac{静态耗散_{new}+动态耗散_{new}}{静态耗散_{old}+动态耗散_{old}} = 90\%$
$电压U\times电流I$
$负载电容C\times 电压U^2\times 开关频率f$

Petium 4 Prescott
$C=3.2\times 10^{-8} \quad U=1.25 \quad P_{os} = 10\quad P_{od} = 90\quad f_{nd} = f_{od} = 3.6\times 10^9$
我们不难得知
$I_{ns} = I_{os} = 8\quad$

上述公式整理后有
$fCU_n^2 + IU_n = P_{ns}+P_{nd} \\\\\\3.2\times 10^{-8}\times 3.6\times 10^9\times U_{n}^2 +U_n\times 8 = 90\\ U_n = 0.85V$

Core i5 Ivy Bridge
$2.9\times 10^{-8}\times 3.4\times 10^9\times U_{n}^2 +U_n\times 33.3 = 63\\U_n=0.64$

1.9

Assume for arithmetic, load/store, and branch instructions, a processor has
CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program
requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store
instructions, and 256 million branch instructions. Assume that each processor has
a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number
of arithmetic and load/store instructions per processor is divided by 0.7 x p (where
p is the number of processors) but the number of branch instructions per processor
remains the same.

1.9.1 [5] <§1.7> Find the total execution time for this program on 1, 2, 4, and 8
processors, and show the relative speedup of the 2, 4, and 8 processor result relative
to the single processor result.
1.13 Exercises 57
$\frac{指令数量\times CPI}{时钟频率}$
$T_1 = \frac{(2.56\times 10^9)+ (12\times 1.28\times 10^9)+(5\times 2.56\times10^8)}{2\times10^9 }=9.6$
$T_2 = \frac{(2.56\times 10^9 /1.4)+ (12\times 1.28\times 10^9/1.4)+(5\times2.56\times10^8)}{2\times10^9}=7.04$
$T_4 = \frac{(2.56\times 10^9 /2.8)+ (12\times 1.28\times 10^9/2.8)+(5\times2.56\times10^8)}{2\times10^9}=3.84$
$T_8 = \frac{(2.56\times 10^9 /5.6)+ (12\times 1.28\times 10^9/5.6)+(5\times2.56\times10^8)}{2\times10^9}=2.24$
加速比分别为
$\frac{T_1}{T_2}=1.36\quad\frac{T_1}{T_4}=2.5\quad\frac{T_1}{T_8}=4.29$

1.9.2 [10] <§§1.6, 1.8> If the CPI of the arithmetic instructions was doubled,
what would the impact be on the execution time of the program on 1, 2, 4, or 8
processors?
$T_1 = \frac{(2\times2.56\times 10^9)+ (12\times 1.28\times 10^9)+(5\times2.56\times10^8)}{2\times10^9 }=10.88$
$T_2 = \frac{(2\times2.56\times 10^9 /1.4)+ (12\times 1.28\times 10^9/1.4)+(5\times2.56\times10^8)}{2\times10^9}=7.95$
$T_4 = \frac{(2\times2.56\times 10^9 /2.8)+ (12\times 1.28\times 10^9/2.8)+(5\times2.56\times10^8)}{2\times10^9}=4.3$
$T_8 = \frac{(2\times2.56\times 10^9 /5.6)+ (12\times 1.28\times 10^9/5.6)+(5\times2.56\times10^8)}{2\times10^9}=2.47$

1.9.3 [10] <§§1.6, 1.8> To what should the CPI of load/store instructions be
reduced in order for a single processor to match the performance of four processors
using the original CPI values?
$\frac{(2.56\times 10^9)+ (x\times 1.28\times 10^9)+(5\times 2.56\times10^8)}{2\times10^9 } = \frac{(2.56\times 10^9 /2.8)+ (12\times 1.28\times 10^9/2.8)+(5\times2.56\times10^8)}{2\times10^9}$
$x = 3$
降低9