Computer composition and design work01(1.6-1.9)——fifth verson

本文探讨了不同处理器性能指标(如CPI和时钟速率)对程序执行效率的影响,比较了不同编译器对执行时间和CPI的贡献,并分析了电力消耗与技术改进的关系。通过实例计算展示了并行化对性能的影响和CPI变化对执行时间的效应。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.6

[20] <§1.6> Consider two diff erent implementations of the same instruction
set architecture. Th e instructions can be divided into four classes according to
their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3,
and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.

Given a program with a dynamic instruction count of 1.0E6 instructions divided
into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D,
which implementation is faster?

a. What is the global CPI for each implementation?

b. Find the clock cycles required in both cases.

C P I 1 = 1 × 10 % + 2 × 20 % + 3 × 50 % + 3 × 20 % = 2.6 C P I 2 = 2 × 10 % + 2 × 20 % + 2 × 50 % + 2 × 20 % = 2 CPI_1 = 1 \times 10\% + 2\times20\%+3\times 50\%+3\times20\%=2.6\\CPI_2 = 2 \times 10\% + 2\times20\%+2\times 50\%+2\times20\%=2 CPI1=1×10%+2×20%+3×50%+3×20%=2.6CPI2=2×10%+2×20%+2×50%+2×20%=2

c l o c k   c y c l e = C P I × I n s t r u c t i o n s   f o r   a   p r o g r a m C P U 时钟周期数 = 程序指令数 × 指令平均周期数 clock\ cycle = CPI\times Instructions\ for\ a\ program\\ CPU时钟周期数 = 程序指令数 \times 指令平均周期数 clock cycle=CPI×Instructions for a programCPU时钟周期数=程序指令数×指令平均周期数

T 1 = 2.6 × 1.0 × 1 0 6 = 2.6 × 1 0 6 T 2 = 2 × 1.0 × 1 0 6 = 2 × 1 0 6 T_1 = 2.6\times 1.0\times 10^6 = 2.6\times 10^6\\ T_2 = 2\times 1.0\times 10^6 = 2\times 10^6 T1=2.6×1.0×106=2.6×106T2=2×1.0×106=2×106

1.7

[15] <§1.6> Compilers can have a profound impact on the performance
of an application. Assume that for a program, compiler A results in a dynamic
instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B
results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s.

a. Find the average CPI for each program given that the processor has a clock cycle
time of 1 ns.

C P I = C P U 时钟周期数 程序指令数 = C P U 执行时间 时钟周期长度 × 程序指令数 CPI = \frac{CPU时钟周期数}{程序指令数} = \frac{CPU执行时间}{时钟周期长度\times 程序指令数} CPI=程序指令数CPU时钟周期数=时钟周期长度×程序指令数CPU执行时间

C P I A = 1.1 ( 1.0 × 1 0 − 9 ) ( 1.0 × 1 0 9 ) = 1.1 CPI_A=\frac{1.1}{(1.0\times10^{-9})(1.0\times 10^{9})} = 1.1 CPIA=(1.0×109)(1.0×109)1.1=1.1
C P I B = 1.5 ( 1.0 × 1 0 − 9 ) ( 1.2 × 1 0 9 ) = 1.25 CPI_B=\frac{1.5}{(1.0\times10^{-9})(1.2\times 10^{9})} = 1.25 CPIB=(1.0×109)(1.2×109)1.5=1.25

b. Assume the compiled programs run on two diff erent processors. If the execution
times on the two processors are the same, how much faster is the clock of the
processor running compiler A’s code versus the clock of the processor running
compiler B’s code?

F a s t e r = T I M E B T I M E A = 1.5 1.1 = 1.363 Faster = \frac{TIME_B}{TIME_A}=\frac{1.5}{1.1} = 1.363 Faster=TIMEATIMEB=1.11.5=1.363

c. A new compiler is developed that uses only 6.0E8 instructions and has an
average CPI of 1.1. What is the speedup of using this new compiler versus using
compiler A or B on the original processor?
f a s t e r A = T i m e A T i m e = 1.1 × 1 0 9 1.1 × 6 × 1 0 8 = 1.67 faster _A = \frac{Time_A}{Time} = \frac{1.1\times 10^9}{1.1\times 6\times10^8} = 1.67 fasterA=TimeTimeA=1.1×6×1081.1×109=1.67
f a s t e r B = 1.25 × 1.2 × 1 0 9 1.1 × 6 × 1 0 8 = 2.27 faster_B = \frac{1.25\times1.2\times 10^9}{1.1\times 6\times10^8}=2.27 fasterB=1.1×6×1081.25×1.2×109=2.27

1.8

1.8 Th e Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6
GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of static
power and 90 W of dynamic power.
Th e Core i5 Ivy Bridge, released in 2012, had a clock rate of 3.4 GHz and voltage
of 0.9 V. Assume that, on average, it consumed 30 W of static power and 40 W of
dynamic power.

1.8.1 [5] <§1.7> For each processor find the average capacitive loads.
每个处理器的平均电容负载 = 2 × 动态能耗 电 压 2 × 开关频率 每个处理器的平均电容负载 = \frac{2\times 动态能耗}{电压^2\times开关频率} 每个处理器的平均电容负载=2×开关频率2×动态能耗
Petium 4 Prescott
C 1 = 2 × 90 1.2 5 2 × 3.6 × 1 0 9 = 3.2 × 1 0 − 8 C_1 = \frac{2\times 90}{1.25^2\times3.6\times10^9}=3.2\times 10^{-8} C1=1.252×3.6×1092×90=3.2×108
Core i5 Ivy Bridge
C 2 = 2 × 40 0. 9 2 × 3.4 × 1 0 9 = 2.9 × 1 0 − 8 C_2 = \frac{2\times 40}{0.9^2\times 3.4\times10^9} = 2.9\times 10^{-8} C2=0.92×3.4×1092×40=2.9×108
1.8.2 [5] <§1.7> Find the percentage of the total dissipated power comprised by
static power and the ratio of static power to dynamic power for each technology.

Petium 4 Prescott
10 10 + 90 = 10 % 10 90 = 11.1 % \frac{10}{10+90} = 10\%\quad \frac{10}{90} = 11.1\% 10+9010=10%9010=11.1%
Core i5 Ivy Bridge
30 30 + 40 = 42.9 % 30 40 = 75 % \frac{30}{30+40} = 42.9\% \quad \frac{30}{40} = 75\% 30+4030=42.9%4030=75%
1.8.3 [15] <§1.7> If the total dissipated power is to be reduced by 10%, how much
should the voltage be reduced to maintain the same leakage current? Note: power
is defi ned as the product of voltage and current.
整体耗 散 n e w 整体耗 散 o l d = 静态耗 散 n e w + 动态耗 散 n e w 静态耗 散 o l d + 动态耗 散 o l d = 90 % \frac{整体耗散_{new}}{整体耗散_{old}}=\frac{静态耗散_{new}+动态耗散_{new}}{静态耗散_{old}+动态耗散_{old}} = 90\% 整体耗old整体耗new=静态耗old+动态耗old静态耗new+动态耗new=90%
静态耗散 = 电压 U × 电流 I 静态耗散 = 电压U\times电流I 静态耗散=电压U×电流I
动态耗散 = 负载电容 C × 电压 U 2 × 开关频率 f 动态耗散 = 负载电容C\times 电压U^2\times 开关频率f 动态耗散=负载电容C×电压U2×开关频率f

Petium 4 Prescott
C = 3.2 × 1 0 − 8 U = 1.25 P o s = 10 P o d = 90 f n d = f o d = 3.6 × 1 0 9 C=3.2\times 10^{-8} \quad U=1.25 \quad P_{os} = 10\quad P_{od} = 90\quad f_{nd} = f_{od} = 3.6\times 10^9 C=3.2×108U=1.25Pos=10Pod=90fnd=fod=3.6×109
我们不难得知
I n s = I o s = 8 I_{ns} = I_{os} = 8\quad Ins=Ios=8

上述公式整理后有
f C U n 2 + I U n = P n s + P n d 3.2 × 1 0 − 8 × 3.6 × 1 0 9 × U n 2 + U n × 8 = 90 U n = 0.85 V fCU_n^2 + IU_n = P_{ns}+P_{nd} \\\\\\3.2\times 10^{-8}\times 3.6\times 10^9\times U_{n}^2 +U_n\times 8 = 90\\ U_n = 0.85V fCUn2+IUn=Pns+Pnd3.2×108×3.6×109×Un2+Un×8=90Un=0.85V

Core i5 Ivy Bridge
2.9 × 1 0 − 8 × 3.4 × 1 0 9 × U n 2 + U n × 33.3 = 63 U n = 0.64 2.9\times 10^{-8}\times 3.4\times 10^9\times U_{n}^2 +U_n\times 33.3 = 63\\U_n=0.64 2.9×108×3.4×109×Un2+Un×33.3=63Un=0.64

1.9

Assume for arithmetic, load/store, and branch instructions, a processor has
CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program
requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store
instructions, and 256 million branch instructions. Assume that each processor has
a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number
of arithmetic and load/store instructions per processor is divided by 0.7 x p (where
p is the number of processors) but the number of branch instructions per processor
remains the same.

1.9.1 [5] <§1.7> Find the total execution time for this program on 1, 2, 4, and 8
processors, and show the relative speedup of the 2, 4, and 8 processor result relative
to the single processor result.
1.13 Exercises 57
T = 指令数量 × C P I 时钟频率 T = \frac{指令数量\times CPI}{时钟频率} T=时钟频率指令数量×CPI
T 1 = ( 2.56 × 1 0 9 ) + ( 12 × 1.28 × 1 0 9 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 9.6 T_1 = \frac{(2.56\times 10^9)+ (12\times 1.28\times 10^9)+(5\times 2.56\times10^8)}{2\times10^9 }=9.6 T1=2×109(2.56×109)+(12×1.28×109)+(5×2.56×108)=9.6
T 2 = ( 2.56 × 1 0 9 / 1.4 ) + ( 12 × 1.28 × 1 0 9 / 1.4 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 7.04 T_2 = \frac{(2.56\times 10^9 /1.4)+ (12\times 1.28\times 10^9/1.4)+(5\times2.56\times10^8)}{2\times10^9}=7.04 T2=2×109(2.56×109/1.4)+(12×1.28×109/1.4)+(5×2.56×108)=7.04
T 4 = ( 2.56 × 1 0 9 / 2.8 ) + ( 12 × 1.28 × 1 0 9 / 2.8 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 3.84 T_4 = \frac{(2.56\times 10^9 /2.8)+ (12\times 1.28\times 10^9/2.8)+(5\times2.56\times10^8)}{2\times10^9}=3.84 T4=2×109(2.56×109/2.8)+(12×1.28×109/2.8)+(5×2.56×108)=3.84
T 8 = ( 2.56 × 1 0 9 / 5.6 ) + ( 12 × 1.28 × 1 0 9 / 5.6 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 2.24 T_8 = \frac{(2.56\times 10^9 /5.6)+ (12\times 1.28\times 10^9/5.6)+(5\times2.56\times10^8)}{2\times10^9}=2.24 T8=2×109(2.56×109/5.6)+(12×1.28×109/5.6)+(5×2.56×108)=2.24
加速比分别为
T 1 T 2 = 1.36 T 1 T 4 = 2.5 T 1 T 8 = 4.29 \frac{T_1}{T_2}=1.36\quad\frac{T_1}{T_4}=2.5\quad\frac{T_1}{T_8}=4.29 T2T1=1.36T4T1=2.5T8T1=4.29

1.9.2 [10] <§§1.6, 1.8> If the CPI of the arithmetic instructions was doubled,
what would the impact be on the execution time of the program on 1, 2, 4, or 8
processors?
T 1 = ( 2 × 2.56 × 1 0 9 ) + ( 12 × 1.28 × 1 0 9 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 10.88 T_1 = \frac{(2\times2.56\times 10^9)+ (12\times 1.28\times 10^9)+(5\times2.56\times10^8)}{2\times10^9 }=10.88 T1=2×109(2×2.56×109)+(12×1.28×109)+(5×2.56×108)=10.88
T 2 = ( 2 × 2.56 × 1 0 9 / 1.4 ) + ( 12 × 1.28 × 1 0 9 / 1.4 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 7.95 T_2 = \frac{(2\times2.56\times 10^9 /1.4)+ (12\times 1.28\times 10^9/1.4)+(5\times2.56\times10^8)}{2\times10^9}=7.95 T2=2×109(2×2.56×109/1.4)+(12×1.28×109/1.4)+(5×2.56×108)=7.95
T 4 = ( 2 × 2.56 × 1 0 9 / 2.8 ) + ( 12 × 1.28 × 1 0 9 / 2.8 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 4.3 T_4 = \frac{(2\times2.56\times 10^9 /2.8)+ (12\times 1.28\times 10^9/2.8)+(5\times2.56\times10^8)}{2\times10^9}=4.3 T4=2×109(2×2.56×109/2.8)+(12×1.28×109/2.8)+(5×2.56×108)=4.3
T 8 = ( 2 × 2.56 × 1 0 9 / 5.6 ) + ( 12 × 1.28 × 1 0 9 / 5.6 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = 2.47 T_8 = \frac{(2\times2.56\times 10^9 /5.6)+ (12\times 1.28\times 10^9/5.6)+(5\times2.56\times10^8)}{2\times10^9}=2.47 T8=2×109(2×2.56×109/5.6)+(12×1.28×109/5.6)+(5×2.56×108)=2.47

1.9.3 [10] <§§1.6, 1.8> To what should the CPI of load/store instructions be
reduced in order for a single processor to match the performance of four processors
using the original CPI values?
( 2.56 × 1 0 9 ) + ( x × 1.28 × 1 0 9 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 = ( 2.56 × 1 0 9 / 2.8 ) + ( 12 × 1.28 × 1 0 9 / 2.8 ) + ( 5 × 2.56 × 1 0 8 ) 2 × 1 0 9 \frac{(2.56\times 10^9)+ (x\times 1.28\times 10^9)+(5\times 2.56\times10^8)}{2\times10^9 } = \frac{(2.56\times 10^9 /2.8)+ (12\times 1.28\times 10^9/2.8)+(5\times2.56\times10^8)}{2\times10^9} 2×109(2.56×109)+(x×1.28×109)+(5×2.56×108)=2×109(2.56×109/2.8)+(12×1.28×109/2.8)+(5×2.56×108)
x = 3 x=3 x=3
降低9

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值