Computer composition and design work02——fifth verson

Computer Composition and Design Homework 2

11

1.11 Th e results of the SPEC CPU2006 bzip2 benchmark running on an AMD
Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a
reference time of 9650 s.

1.11.1 [5] <§§1.6, 1.9> Find the CPI if the clock cycle time is 0.333 ns.
C P I = C P U 时钟周期数 程序的指令数 CPI = \frac{CPU时钟周期数}{程序的指令数} CPI=程序的指令数CPU时钟周期数
C P U 时间 = C P U 时钟周期数 × 时钟周期长度 CPU时间=CPU时钟周期数\times时钟周期长度 CPU时间=CPU时钟周期数×时钟周期长度
所以有
C P I = C P U 时间 时钟周期长度 × 程序的指令数 = 750 0.333 × 1 0 − 9 × 2.389 × 1 0 12 = 0.95 CPI=\frac{CPU时间}{时钟周期长度\times程序的指令数}\\\quad\\=\frac{750}{0.333\times10^{-9}\times2.389\times10^{12}}\\\quad\\=0.95 CPI=时钟周期长度×程序的指令数CPU时间=0.333×109×2.389×1012750=0.95
1.11.2 [5] <§1.9> Find the S
PECratio.
S P E C = 参考处理时间 实际处理时间 = 9650 750 = 12.9 SPEC=\frac{参考处理时间}{实际处理时间}=\frac{9650}{750}=12.9 SPEC=实际处理时间参考处理时间=7509650=12.9

1.11.3 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions
of the benchmark is increased by 10% without aff ecting the CPI.
T i m e n e w = 0.95 × 1.1 × 0.333 × 1 0 − 9 × 2.389 × 1 0 12 = 831.33 Time_{new}=0.95\times1.1\times0.333\times10^{-9}\times2.389\times10^{12}=831.33 Timenew=0.95×1.1×0.333×109×2.389×1012=831.33
Δ T = 831.33 − 750 = 81.33 \Delta T=831.33-750=81.33 ΔT=831.33750=81.33
或者直接提升 10 % 或者直接提升10\% 或者直接提升10%

1.11.4 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions
of the benchmark is increased by 10% and the CPI is increased by 5%.
T i m e n e w = 1.05 × 0.95 × 1.1 × 0.333 × 1 0 − 9 × 2.389 × 1 0 12 = 872.90 Time_{new}=1.05\times0.95\times1.1\times0.333\times10^{-9}\times2.389\times10^{12}=872.90 Timenew=1.05×0.95×1.1×0.333×109×2.389×1012=872.90
Δ T = 872.90 − 750 = 122.9 \Delta T=872.90-750=122.9 ΔT=872.90750=122.9

或者直接提升 15.5 % 或者直接提升15.5\% 或者直接提升15.5%

1.11.5 [5] <§§1.6, 1.9> Find the change in the SPECratio for this change.
S P E C = 9650 750 × 1.155 = 11.13 SPEC = \frac{9650}{750\times1.155}=11.13 SPEC=750×1.1559650=11.13

1.11.6 [10] <§1.6> Suppose that we are developing a new version of the AMD
Barcelona processor with a 4 GHz clock rate. We have added some additional
instructions to the instruction set in such a way that the number of instructions
has been reduced by 15%. Th e execution time is reduced to 700 s and the new
SPECratio is 13.7. Find the new CPI.
C P I = C P U 使用时间 × 时钟频率 程序指令数 = 700 × 4 × 1 0 9 2.389 × 1 0 12 × ( 1 − 0.15 ) = 1.38 CPI = \frac{CPU使用时间\times 时钟频率}{程序指令数} = \frac{700\times 4\times10^9}{2.389\times 10^{12}\times(1-0.15)}=1.38 CPI=程序指令数CPU使用时间×时钟频率=2.389×1012×(10.15)700×4×109=1.38

1.11.7 [10] <§1.6> Th is CPI value is larger than obtained in 1.11.1 as the clock
rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the
CPI is similar to that of the clock rate. If they are dissimilar, why?
C P I = C P U 使用时间 × 时钟频率 程序指令数 CPI = \frac{CPU使用时间\times 时钟频率}{程序指令数} CPI=程序指令数CPU使用时间×时钟频率
在其他条件不变的情况下升高频率是一样的,因为CPI和时钟频率是正比关系

但是如果指令数和运行时间发生改变则不一致

1.11.8 [5] <§1.6> By how much has the CPU time been reduced?

1 − T I M E n e w T I M E o l d = 1 − 700 750 = 6.7 % 1-\frac{TIME_{new}}{TIME_{old}} = 1-\frac{700}{750}=6.7\% 1TIMEoldTIMEnew=1750700=6.7%

1.11.9 [10] <§1.6> For a second benchmark, libquantum, assume an execution
time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is
reduced by an additional 10% without aff ecting to the CPI and with a clock rate of
4 GHz, determine the number of instructions.

指令数 = C P U 使用时间 × 时钟频率 / C P I = 0.9 × 960 × 4 × 1 0 9 / 1.61 = 2147 × 1 0 9 指令数 = CPU使用时间\times时钟频率/CPI\\=0.9\times960\times4\times 10^9/1.61\\=2147\times 10^9 指令数=CPU使用时间×时钟频率/CPI=0.9×960×4×109/1.61=2147×109

1.11.10 [10] <§1.6> Determine the clock rate required to give a further 10%
reduction in CPU time while maintaining the number of instructions and with the
CPI unchanged.

这个是针对第一个的。。

时钟频率 = C P I × 指令 / 时间总数 = ( 1 × 1 / 0.9 ) × 3 G H z = 3.33 G H z 时钟频率 = CPI\times 指令/时间总数 \\=(1\times1/0.9)\times3GHz = 3.33GHz 时钟频率=CPI×指令/时间总数=(1×1/0.9)×3GHz=3.33GHz

1.11.11 [10] <§1.6> Determine the clock rate if the CPI is reduced by 15% and
the CPU time by 20% while the number of instructions is unchanged.
= ( 0.85 × 1 / 0.8 ) × 3 G H z = 3.18 G H z =(0.85\times1/0.8)\times3GHz = 3.18GHz =(0.85×1/0.8)×3GHz=3.18GHz



12

1.12 Section 1.10 cites as a pitfall the utilization of a subset of the performance
equation as a performance metric. To illustrate this, consider the following two
processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the
execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of
0.75, and requires the execution of 1.0E9 instructions.
M I P S = 指令数 执行时间 × 1 0 6 = 时钟频率 C P I × 1 0 6 MIPS = \frac{指令数}{执行时间\times10^6}=\frac{时钟频率}{CPI\times 10^6} MIPS=执行时间×106指令数=CPI×106时钟频率

1.12.1 [5] <§§1.6, 1.10> One usual fallacy is to consider the computer with the
largest clock rate as having the largest performance. Check if this is true for P1 and
P2.
T = 指令 × C P I / 时钟频率 T =指令\times CPI/时钟频率 T=指令×CPI/时钟频率
带入有
T 1 = 1.125 s T 2 = 0.25 s T_1 = 1.125s\quad T_2 = 0.25s T1=1.125sT2=0.25s
第二个所用的时间更少

1.12.2 [10] <§§1.6, 1.10> Another fallacy is to consider that the processor executing
the largest number of instructions will need a larger CPU time. Considering that
processor P1 is executing a sequence of 1.0E9 instructions and that the CPI of
processors P1 and P2 do not change, determine the number of instructions that P2
can execute in the same time that P1 needs to execute 1.0E9 instructions.

指令 数 2 = 1 0 9 × 0.9 4 × 1 0 9 × 3 × 1 0 9 0.75 = 9 × 1 0 8 指令数_2 = \frac{10^9\times0.9}{4\times10^9}\times\frac{3\times 10^9}{0.75}=9\times 10^8 指令2=4×109109×0.9×0.753×109=9×108
1.12.3 [10] <§§1.6, 1.10> A common fallacy is to use MIPS (millions of
instructions per second) to compare the performance of two diff erent processors,
and consider that the processor with the largest MIPS has the largest performance.
Check if this is true for P1 and P2.
M I P S = 指令数 执行时间 × 1 0 6 = 时钟频率 C P I × 1 0 6 MIPS = \frac{指令数}{执行时间\times 10^6}=\frac{时钟频率}{CPI\times 10^6} MIPS=执行时间×106指令数=CPI×106时钟频率
M I P S 1 = 4.444 × 1 0 6 M I P S 2 = 4 × 1 0 6 MIPS_1 =4.444\times 10^6\quad MIPS_2 =4\times 10^6 MIPS1=4.444×106MIPS2=4×106

1.12.4 [10] <§1.10> Another common performance fi gure is MFLOPS (millions
of fl oating-point operations per second), defi ned as
MFLOPS = No. FP operations / (execution time × 1E6)
but this fi gure has the same problems as MIPS. Assume that 40% of the instructions
executed on both P1 and P2 are fl oating-point instructions. Find the MFLOPS
fi gures for the programs.

M F L O P 1 = M I P 1 × 0.4 = 1.777 × 1 0 6 M F L O P 2 = M I P 2 × 0.4 = 1.0 × 1 0 6 MFLOP_1 = MIP_1\times 0.4 = 1.777\times10^6\quad MFLOP_2 = MIP_2\times 0.4 = 1.0\times10^6 MFLOP1=MIP1×0.4=1.777×106MFLOP2=MIP2×0.4=1.0×106


13

1.13 Another pitfall cited in Section 1.10 is expecting to improve the overall
performance of a computer by improving only one aspect of the computer. Consider
a computer running a program that requires 250 s, with 70 s spent executing FP
instructions, 85 s executed L/S instructions, and 40 s spent executing branch
instructions.

1.13.1 [5] <§1.10> By how much is the total time reduced if the time for FP
operations is reduced by 20%?
Δ T = 70 × ( 20 % ) = 14 14 250 = 5.6 % \Delta T = 70\times(20\%)=14\quad\quad \frac{14}{250}=5.6\% ΔT=70×(20%)=1425014=5.6%

1.13.2 [5] <§1.10> By how much is the time for INT operations reduced if the
total time is reduced by 20%?
Δ T = 250 × 20 % = 50 \Delta T=250\times20\% = 50 ΔT=250×20%=50
T i n t = 250 − 85 − 40 − 70 = 55 T_{int} = 250-85-40-70=55 Tint=250854070=55
50 55 = 90.9 % \frac{50}{55}=90.9\% 5550=90.9%

1.13.3 [5] <§1.10> Can the total time can be reduced by 20% by reducing only
the time for branch instructions?
40 250 = 16 % < 20 % \frac{40}{250}=16\%<20\% 25040=16%<20%
即使将分支时间改为0也不可能让总体系统时间减少20%


14

1.14 Assume a program requires the execution of 50 × 10^6 FP instructions,
110 × 10^6 INT instructions, 80 × 10^6 L/S instructions, and 16 × 10^6 branch
instructions. Th e CPI for each type of instruction is 1, 1, 4, and 2, respectively.
Assume that the processor has a 2 GHz clock rate.

1.14.1 [10] <§1.10> By how much must we improve the CPI of FP instructions if
we want the program to run two times faster?

T o l d = 50 × 1 0 6 × 1 + 110 × 1 0 6 × 1 + 80 × 1 0 6 × 4 + 16 × 1 0 6 × 2 2 × 1 0 9 = 0.256 T_{old}=\frac{50\times10^6\times1+110\times10^6\times1+80\times10^6\times4+16\times10^6\times2}{2\times10^9}=0.256 Told=2×10950×106×1+110×106×1+80×106×4+16×106×2=0.256
T n e w = T o l d / 2 T_{new} = T_{old}/2 Tnew=Told/2
令 x = C P I 浮点指令 令x=CPI_{浮点指令} x=CPI浮点指令
代入得到 50 x + 110 + 320 + 32 = 256 x < 0 50x+110+320+32=256\quad x<0 50x+110+320+32=256x<0
无解

1.14.2 [10] <§1.10> By how much must we improve the CPI of L/S instructions
if we want the program to run two times faster?
令 x = C P I L / S 令x=CPI_{L/S} x=CPIL/S
代入得到 50 + 110 + 80 x + 32 = 256 50+110+80x+32=256 50+110+80x+32=256
x = 0.8 x=0.8 x=0.8
1.14.3 [5] <§1.10> By how much is the execution time of the program improved
if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and
Branch is reduced by 30%?
T n e w = 50 × ( 1 − 0.4 ) + 110 × ( 1 − 0.4 ) + 80 × 4 × ( 1 − 0.3 ) + 16 × 2 ( 1 − 0.3 ) 2 × 1 0 3 = 0.171 T_{new}=\frac{50\times(1-0.4)+110\times(1-0.4)+80\times4\times(1-0.3)+16\times2(1-0.3)}{2\times10^3}=0.171 Tnew=2×10350×(10.4)+110×(10.4)+80×4×(10.3)+16×2(10.3)=0.171
1 − T n T o = 33.2 % 1-\frac{T_n}{T_o}=33.2\% 1ToTn=33.2%


15

1.15 [5] <§1.8> When a program is adapted to run on multiple processors in
a multiprocessor system, the execution time on each processor is comprised of
computing time and the overhead time required for locked critical sections and/or
to send data from one processor to another.
Assume a program requires t = 100 s of execution time on one processor. When run
p processors, each processor requires t/p s, as well as an additional 4 s of overhead,
irrespective of the number of processors. Compute the per-processor execution
time for 2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the corresponding
speedup relative to a single processor and the ratio between actual speedup versus
ideal speedup (speedup if there was no overhead).
§1.1, page 10: Discussion questions: many answers are acceptable.
§1.4, page 24: DRAM memory: volatile, short access time of 50 to 70 nanoseconds,
and cost per GB is $5 to $10. Disk memory: nonvolatile, access times are 100,000
to 400,000 times slower than DRAM, and cost per GB is 100 times cheaper than
DRAM. Flash memory: nonvolatile, access times are 100 to 1000 times slower than
DRAM, and cost per GB is 7 to 10 times cheaper than DRAM.
§1.5, page 28: 1, 3, and 4 are valid reasons. Answer 5 can be generally true because
high volume can make the extra investment to reduce die size by, say, 10% a good
economic decision, but it doesn’t have to be true.
§1.6, page 33: 1. a: both, b: latency, c: neither. 7 seconds.
§1.6, page 40: b.
§1.10, page 51: a. Computer A has the higher MIPS rating. b. Computer B is faster.

T 1 = 100 T n = 100 / n + 4 T_1 = 100 \quad T_n = 100/n+4 T1=100Tn=100/n+4
n为大于二的整数

核数执行时间加速比理想加速比实际加速比
110011
2541.851.85/2=0.93
4293.440.86
816.56.070.76
1610.259.760.61
327.1314.020.43
645.5617.980.281
1284.7820.920.163
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
《COMPUTER ORGANIZATION AND DESIGN.pdf》是一本关于计算机组织与设计的书籍。该书由图灵奖得主David A. Patterson和John L. Hennessy合著,于2017年出版。它提供了关于计算机硬件和软件接口的详细介绍,旨在帮助读者理解计算机系统的工作原理和设计方法。此书的PDF版本包含了高清的内容,可以复制文本,便于读者进行学习和参考。另外,还有一份单文件版的《Solution for Computer Organization and Design , Fifth Edition》也可以在其中找到。这个版本是由阅读者自行合并的,目的是为了方便阅读,并去除了批注水印。<span class="em">1</span><span class="em">2</span> #### 引用[.reference_title] - *1* [Computer Organization and Design 5th 习题答案(去水印单文件版)](https://download.csdn.net/download/mathscmc/11231852)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* [Computer.Organization.and.Design.The.Hardware.Software.Interface.ARM.Edition.pdf](https://download.csdn.net/download/huntwin/11294211)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值