Computer composition and design work02——fifth verson

JamSlade

已于 2022-08-26 10:49:12 修改

阅读量980

点赞数

分类专栏：计算机组成与结构文章标签：计算机组成

于 2022-03-25 10:19:16 首次发布

本文链接：https://blog.csdn.net/JamSlade/article/details/123728952

版权

计算机组成与结构专栏收录该内容

11 篇文章 20 订阅

订阅专栏

Computer Composition and Design Homework 2

11

1.11 Th e results of the SPEC CPU2006 bzip2 benchmark running on an AMD
Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a
reference time of 9650 s.

1.11.1 [5] <§§1.6, 1.9> Find the CPI if the clock cycle time is 0.333 ns.
$\frac{CPU时钟周期数}{程序的指令数}$
$CPU时间=CPU时钟周期数\times时钟周期长度$
所以有
$CPI=\frac{CPU时间}{时钟周期长度\times程序的指令数}\\\quad\\=\frac{750}{0.333\times10^{-9}\times2.389\times10^{12}}\\\quad\\=0.95$
1.11.2 [5] <§1.9> Find the S
PECratio.
$SPEC=\frac{参考处理时间}{实际处理时间}=\frac{9650}{750}=12.9$

1.11.3 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions
of the benchmark is increased by 10% without aff ecting the CPI.
$Time_{new}=0.95\times1.1\times0.333\times10^{-9}\times2.389\times10^{12}=831.33$
$\Delta T=831.33-750=81.33$
$或者直接提升10\%$

1.11.4 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions
of the benchmark is increased by 10% and the CPI is increased by 5%.
$Time_{new}=1.05\times0.95\times1.1\times0.333\times10^{-9}\times2.389\times10^{12}=872.90$
$\Delta T=872.90-750=122.9$

$或者直接提升15.5\%$

1.11.5 [5] <§§1.6, 1.9> Find the change in the SPECratio for this change.
$\frac{9650}{750\times1.155}=11.13$

1.11.6 [10] <§1.6> Suppose that we are developing a new version of the AMD
Barcelona processor with a 4 GHz clock rate. We have added some additional
instructions to the instruction set in such a way that the number of instructions
has been reduced by 15%. Th e execution time is reduced to 700 s and the new
SPECratio is 13.7. Find the new CPI.
$\frac{CPU使用时间\times 时钟频率}{程序指令数} = \frac{700\times 4\times10^9}{2.389\times 10^{12}\times(1-0.15)}=1.38$

1.11.7 [10] <§1.6> Th is CPI value is larger than obtained in 1.11.1 as the clock
rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the
CPI is similar to that of the clock rate. If they are dissimilar, why?
$\frac{CPU使用时间\times 时钟频率}{程序指令数}$
在其他条件不变的情况下升高频率是一样的，因为CPI和时钟频率是正比关系

但是如果指令数和运行时间发生改变则不一致

1.11.8 [5] <§1.6> By how much has the CPU time been reduced?

$1-\frac{TIME_{new}}{TIME_{old}} = 1-\frac{700}{750}=6.7\%$

1.11.9 [10] <§1.6> For a second benchmark, libquantum, assume an execution
time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is
reduced by an additional 10% without aff ecting to the CPI and with a clock rate of
4 GHz, determine the number of instructions.

$CPU使用时间\times时钟频率/CPI\\=0.9\times960\times4\times 10^9/1.61\\=2147\times 10^9$

1.11.10 [10] <§1.6> Determine the clock rate required to give a further 10%
reduction in CPU time while maintaining the number of instructions and with the
CPI unchanged.

这个是针对第一个的。。

$CPI\times 指令/时间总数 \\=(1\times1/0.9)\times3GHz = 3.33GHz$

1.11.11 [10] <§1.6> Determine the clock rate if the CPI is reduced by 15% and
the CPU time by 20% while the number of instructions is unchanged.
$=(0.85\times1/0.8)\times3GHz = 3.18GHz$

12

1.12 Section 1.10 cites as a pitfall the utilization of a subset of the performance
equation as a performance metric. To illustrate this, consider the following two
processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the
execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of
0.75, and requires the execution of 1.0E9 instructions.
$\frac{指令数}{执行时间\times10^6}=\frac{时钟频率}{CPI\times 10^6}$

1.12.1 [5] <§§1.6, 1.10> One usual fallacy is to consider the computer with the
largest clock rate as having the largest performance. Check if this is true for P1 and
P2.
$=指令\times CPI/时钟频率$
带入有
$T_1 = 1.125s\quad T_2 = 0.25s$
第二个所用的时间更少

1.12.2 [10] <§§1.6, 1.10> Another fallacy is to consider that the processor executing
the largest number of instructions will need a larger CPU time. Considering that
processor P1 is executing a sequence of 1.0E9 instructions and that the CPI of
processors P1 and P2 do not change, determine the number of instructions that P2
can execute in the same time that P1 needs to execute 1.0E9 instructions.

$指令数_2 = \frac{10^9\times0.9}{4\times10^9}\times\frac{3\times 10^9}{0.75}=9\times 10^8$
1.12.3 [10] <§§1.6, 1.10> A common fallacy is to use MIPS (millions of
instructions per second) to compare the performance of two diff erent processors,
and consider that the processor with the largest MIPS has the largest performance.
Check if this is true for P1 and P2.
$\frac{指令数}{执行时间\times 10^6}=\frac{时钟频率}{CPI\times 10^6}$
$MIPS_1 =4.444\times 10^6\quad MIPS_2 =4\times 10^6$

1.12.4 [10] <§1.10> Another common performance fi gure is MFLOPS (millions
of fl oating-point operations per second), defi ned as
MFLOPS = No. FP operations / (execution time × 1E6)
but this fi gure has the same problems as MIPS. Assume that 40% of the instructions
executed on both P1 and P2 are fl oating-point instructions. Find the MFLOPS
fi gures for the programs.

$MFLOP_1 = MIP_1\times 0.4 = 1.777\times10^6\quad MFLOP_2 = MIP_2\times 0.4 = 1.0\times10^6$

13

1.13 Another pitfall cited in Section 1.10 is expecting to improve the overall
performance of a computer by improving only one aspect of the computer. Consider
a computer running a program that requires 250 s, with 70 s spent executing FP
instructions, 85 s executed L/S instructions, and 40 s spent executing branch
instructions.

1.13.1 [5] <§1.10> By how much is the total time reduced if the time for FP
operations is reduced by 20%?
$\Delta T = 70\times(20\%)=14\quad\quad \frac{14}{250}=5.6\%$

1.13.2 [5] <§1.10> By how much is the time for INT operations reduced if the
total time is reduced by 20%?
$\Delta T=250\times20\% = 50$
$T_{int} = 250-85-40-70=55$
$\frac{50}{55}=90.9\%$

1.13.3 [5] <§1.10> Can the total time can be reduced by 20% by reducing only
the time for branch instructions?
$\frac{40}{250}=16\%<20\%$
即使将分支时间改为0也不可能让总体系统时间减少20%

14

1.14 Assume a program requires the execution of 50 × 10^6 FP instructions,
110 × 10^6 INT instructions, 80 × 10^6 L/S instructions, and 16 × 10^6 branch
instructions. Th e CPI for each type of instruction is 1, 1, 4, and 2, respectively.
Assume that the processor has a 2 GHz clock rate.

1.14.1 [10] <§1.10> By how much must we improve the CPI of FP instructions if
we want the program to run two times faster?

$T_{old}=\frac{50\times10^6\times1+110\times10^6\times1+80\times10^6\times4+16\times10^6\times2}{2\times10^9}=0.256$
$T_{new} = T_{old}/2$
$令x=CPI_{浮点指令}$
代入得到 $50x+110+320+32=256\quad x<0$
无解

1.14.2 [10] <§1.10> By how much must we improve the CPI of L/S instructions
if we want the program to run two times faster?
$令x=CPI_{L/S}$
代入得到 $50 + 110 + 80 x + 32 = 256$
$x = 0.8$
1.14.3 [5] <§1.10> By how much is the execution time of the program improved
if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and
Branch is reduced by 30%?
$T_{new}=\frac{50\times(1-0.4)+110\times(1-0.4)+80\times4\times(1-0.3)+16\times2(1-0.3)}{2\times10^3}=0.171$
$1-\frac{T_n}{T_o}=33.2\%$

15

1.15 [5] <§1.8> When a program is adapted to run on multiple processors in
a multiprocessor system, the execution time on each processor is comprised of
computing time and the overhead time required for locked critical sections and/or
to send data from one processor to another.
Assume a program requires t = 100 s of execution time on one processor. When run
p processors, each processor requires t/p s, as well as an additional 4 s of overhead,
irrespective of the number of processors. Compute the per-processor execution
time for 2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the corresponding
speedup relative to a single processor and the ratio between actual speedup versus
ideal speedup (speedup if there was no overhead).
§1.1, page 10: Discussion questions: many answers are acceptable.
§1.4, page 24: DRAM memory: volatile, short access time of 50 to 70 nanoseconds,
and cost per GB is $5 to $10. Disk memory: nonvolatile, access times are 100,000
to 400,000 times slower than DRAM, and cost per GB is 100 times cheaper than
DRAM. Flash memory: nonvolatile, access times are 100 to 1000 times slower than
DRAM, and cost per GB is 7 to 10 times cheaper than DRAM.
§1.5, page 28: 1, 3, and 4 are valid reasons. Answer 5 can be generally true because
high volume can make the extra investment to reduce die size by, say, 10% a good
economic decision, but it doesn’t have to be true.
§1.6, page 33: 1. a: both, b: latency, c: neither. 7 seconds.
§1.6, page 40: b.
§1.10, page 51: a. Computer A has the higher MIPS rating. b. Computer B is faster.

$T_1 = 100 \quad T_n = 100/n+4$
n为大于二的整数

核数	执行时间	加速比	理想加速比实际加速比
1	100	1	1
2	54	1.85	1.85/2=0.93
4	29	3.44	0.86
8	16.5	6.07	0.76
16	10.25	9.76	0.61
32	7.13	14.02	0.43
64	5.56	17.98	0.281
128	4.78	20.92	0.163

JamSlade

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Computer composition and design work02——fifth verson

CPI=程序的指令数CPU时钟周期数CPU时间=CPU时钟周期数×时钟周期长度所以有CPI=时钟周期长度×程序的指令数CPU时间=0.333×10−9×2.389×1012750=0.95PECratio.SPEC=实际处理时间参考处理时间=7509650=12.9Timenew=0.95×1.1×0.333×10−9×2.389×1。...
复制链接

扫一扫

专栏目录