计算机系统结构 Computer Organization and Design RISC-V edition 第一章习题

伍小拾

于 2023-11-12 06:19:59 发布

阅读量600

点赞数

分类专栏：计算机系统结构文章标签： risc-v 系统架构

本文链接：https://blog.csdn.net/Yvonne_wxs/article/details/134356825

版权

计算机系统结构专栏收录该内容

1 篇文章 0 订阅

订阅专栏

计算机系统结构 Computer Organization and Design RISC-V edition 第一章习题

- 一. 英文书
- - 1.
  - 2.
  - 3.
  - 4.
  - 5.
  - 6.
  - 7.
  - 8.
  - 9.
  - 10.
  - 11.
  - 12.
  - 13.
  - 14.
  - 15.
- 2. 课后作业

❗️❗️❗️观前声明❗️❗️❗️

我可不能保证这个玩意写出来是对的, 最好还是自己算算, 当然, 算错了欢迎指出

一. 英文书

1.

Aside from the smart cell phones used by a billion people, list and describe four other types of computers.

超级计算机
服务器
嵌入式计算机
个人电脑
云计算

2.

The eight great ideas in computer architecture are similar to ideas from other fields. Match the eight ideas from computer architecture, “Design for Moore’s Law,” “Use Abstraction to Simplify Design,” “Make the Common Case Fast,” “Performance via Parallelism,” “Performance via Pipelining,” “Performance via Prediction,” “Hierarchy of Memories,” and “Dependability via Redundancy” to the following ideas from other fields:
a. Assembly lines in automobile manufacturing
b. Suspension bridge cables
c. Aircraft and marine navigation systems that incorporate wind information
d. Express elevators in buildings
e. Library reserve desk
f. Increasing the gate area on a CMOS transistor to decrease its switching time
g. Adding electromagnetic aircraft catapults (which are electrically powered as opposed to current steam-powered models), allowed by the increased power generation offered by the new reactor technology
h. Building self-driving cars whose control systems partially rely on existing sensor systems already installed into the base vehicle, such as lane departure systems and smart cruise control systems

a. 流水线

b. 并行

c. 预测

d. 使常见情况快速

e. 分层的内存结构

f. 摩尔定律

g. 通过冗余增加可靠性

h. 使用抽象简化设计

3.

Describe the steps that transform a program written in a high-level language such as C into a representation that is directly executed by a computer processor.

$C源程序(.c)\stackrel{预处理程序}{\longrightarrow}预处理过的源程序(.i)\stackrel{编译程序}{\longrightarrow}汇编源程序(.s)\stackrel{汇编程序}{\longrightarrow}可重定位目标程序(.o)\stackrel{链接程序}{\longrightarrow}可执行目标程序(二进制机器码文件)$

4.

Assume a color display using 8 bits for each of the primary colors (red, green, blue) per pixel and a frame size of 1280 ×1024.

a. What is the minimum size in bytes of the frame buffer to store a frame?

b. How long would it take, at a minimum, for the frame to be sent over a 100 Mbit/s network?

假设一个彩色显示器每像素使用8位表示每种原色(红、绿、蓝)，帧大小为1280 ×1024。

a.存储帧的最小帧缓冲区的字节数是多少?

b.在100mbit /s的网络上发送一个帧至少需要多长时间?

a. $8bit \times 3(colors) \times 1280 \times 1024 = 31457280$

b. $\times 2^{20}) = 0.3$

5.

Consider three different processors P1, P2, and P3 executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.

a. Which processor has the highest performance expressed in instructions per second?

b. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.

c. We are trying to reduce the execution time by 30%, but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction?

$Instruction\ pre\ second = \frac{Clock\ Rate}{CPI} \rightarrow 2/ 2.5 / 1.81818$

P2 has the the highest performance expressed in instructions per second

$number\ of\ cycles = time \times clock\ rate \rightarrow 30G/ 25G/ 40G$

$number\ of\ instructions = \frac{number of cycles}{CPI} \rightarrow 20/ 25 / 18.18181$

$time\ per\ instruction = \frac{CPI}{Clock\ Rate}\rightarrow Clock\ Rate = \frac{CPI}{time\ per\ Instruction} \rightarrow \rightarrow \frac{1.2}{0.7} \rightarrow 1.7142857142857142857142857142857$

$5.14/4.28/6.85$

6.

Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (classes A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.

Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which is faster: P1 or P2?

a. What is the global CPI for each implementation?

b. Find the clock cycles required in both cases.

$global\ CPI\ P1 = 0.1 \times 1 + 0.2 \times 2 + 0.7 \times 3 = 2.6$

$global\ CPI\ P2 = 2$

$clock\ cycles\ P1 = 2.6 \times 10^{6}$

$clock\ cycles\ P2 = 2 \times 10^{6}$

P2 is faster

7.

Compilers can have a profound impact on the performance of an application. Assume that for a program, compiler A results in a dynamic instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s.

a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns.

b. Assume the compiled programs run on two different processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A’s code versus the clock of the processor running compiler B’s code?

c. A new compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or B on the original processor?

$\frac{execution\ time}{time\ per\ cycle \times instruction\ count} \rightarrow 1.1 / 1.25$

$Clock\ Rate = \frac{instruction\ count \ \times CPI}{execution\ time} = \frac{1.1}{1.25 \times 1.2} = 0.7333333333$

$execution\ time = \frac{instruction\ count \times CPI}{Clock\ Rate}\rightarrow 1.83333333/ 2.5$

8.

The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6 GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of static power and 90 W of dynamic power. The Core i5 Ivy Bridge, released in 2012, has a clock rate of 3.4 GHz and voltage of 0.9 V. Assume that, on average, it consumed 30 W of static power and 40 W of dynamic power.

1.8.1 [5] <§1.7> For each processor find the average capacitive loads.

1.8.2 [5] <§1.7> Find the percentage of the total dissipated power comprised by static power and the ratio of static power to dynamic power for each technology.

1.8.3 [15] <§1.7> If the total dissipated power is to be reduced by 10%, how much should the voltage be reduced to maintain the same leakage current? Note: power is defined as the product of voltage and current.

$dynamic\ power =\frac{1}{2} capacitive\ loads \times voltage^{2} \times frequency \rightarrow capacity\ load = \frac{2\times dynamic\ power}{voltage^2 \times frequency} $$1.6\times10^{-8}/ 1.452\times 10^{-8} $

$total\ dissipated\ power\ comprised\ by\ static\ power = 10 % / 11.1 % $

$ ratio\ of\ static power\ to\ dynamic\ power\ for\ each\ technology = \frac{3}{7} / \frac{3}{4} $

$\frac{Power_{new}}{Power_{old}} = \frac{P_{static\ new} + P_{dynamic\ new}}{P_{static\ old} + P_{dynamic\ old}} = \frac{V\times I + capacitive\ load \times V^2 \times f}{P_{static\ old} + P_{dynamic\ old}}$

$\frac{P_{static}}{V}\rightarrow 不变$

$0.85 V /0.64 V$

9.

Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency. Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 × p (where p is the number of processors) but the number of branch instructions per processor remains the same.

1.9.1 [5] <§1.7> Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processors result relative to the single processor result.

1.9.2 [10] <§§1.6, 1.8> If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4, or 8 processors?

1.9.3 [10] <§§1.6, 1.8> To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?

1.9.1

	CPI	IC
算数	1	2.56 * 10⁹
存取	12	1.28 * 10⁹
分支	5	2.56 * 10⁸

$\frac{指令数目 \times CPI}{时钟频率}$

	执行时间	相对单核加速
1	9.6	\
2	7.04	1.36
4	3.84	2.5
8	2.24	4.29

1.9.2

	CPI	IC
算数	2	2.56 * 10⁹
存取	12	1.28 * 10⁹
分支	5	2.56 * 10⁸

	执行时间
1	10.88
2	7.954286
4	4.297143
8	2.468571

1.9.3

时钟频率相等, 因此只需要总周期数一样即可

$CPI_{算数}\times IC_{算数} + CPI_{new存取}\times IC_{存取} = \frac{CPI_{算数}\times IC_{算数} + CPI_{存取}\times IC_{存取}}{2.8} \rightarrow CPI_{new存取} = 3$

10.

Assume a 15 cm diameter wafer has a cost of 12, contains 84 dies, and has 0.020 defects/cm2 . Assume a 20 cm diameter wafer has a cost of 15, contains 100 dies, and has 0.031 defects/cm2 .

1.10.1 [10] <§1.5> Find the yield for both wafers.

1.10.2 [5] <§1.5> Find the cost per die for both wafers.

1.10.3 [5] <§1.5> If the number of dies per wafer is increased by 10% and the defects per area unit increases by 15%, find the die area and yield.

1.10.4 [5] <§1.5> Assume a fabrication process improves the yield from 0.92 to 0.95. Find the defects per area unit for each version of the technology given a die area of 200 mm² .

1.10.1

$\rightarrow 晶圆片面积 = \pi r^2 = 176.625cm^2$

$晶圆片数量 : 84$

$\frac{176.625}{84} = 2.103 cm^2$

$\frac{1}{(1 + (0.02 \times \frac{2.103}{2}))^2} = 0.9794$

$\rightarrow 晶圆片面积 = \pi r^2 = 314cm^2$

$晶圆片数量 : 100$

$\frac{314}{100} = 3.14 cm^2$

$\frac{1}{(1 + (0.031 \times \frac{3.14}{2}))^2} = 0.9535$

1.10.2

妈的再写我要die了

$Cost\ per\ die = \frac{cost \ per \ wafer}{Dies\ per\ wafer \times yield} = 0.146/ 0.157$

1.10.3

$die\ area1 = \frac{2.103}{1.1} = 1.9118$

$\frac{1}{(1+0.02\times \frac{1.15 \times 1.9118}{2})^2} = 0.9574$

$die\ area2 = \frac{3.14}{1.1} = 2.855$

$\frac{1}{(1+0.031\times \frac{1.15 \times 2.855}{2})^2} = 0.9055$

1.10.4

$\frac{1}{(1 + defects\ per\ area \times \frac{2}{2} )^2}\rightarrow defects\ per\ area = 0.043$

11.

The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.

1.11.1 [5] <§§1.6, 1.9> Find the CPI if the clock cycle time is 0.333 ns.

1.11.2 [5] <§1.9> Find the SPECratio.

1.11.3 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% without affecting the CPI.

1.11.4 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.

1.11.5 [5] <§§1.6, 1.9> Find the change in the SPECratio for this change.

1.11.6 [10] <§1.6> Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. The execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.

1.11.7 [10] <§1.6> This CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?

1.11.8 [5] <§1.6> By how much has the CPU time been reduced?

1.11.9 [10] <§1.6> For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without affecting the CPI and with a clock rate of 4 GHz, determine the number of instructions.

1.11.10 [10] <§1.6> Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.

1.11.11 [10] <§1.6> Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.

1.11.1

$\frac{execution\ time}{instruction\ count \times times\ per\ cycle} = 0.95$

1.11.2

$=\frac{参考处理时间}{实际处理时间}= \frac{9650}{750} = 12.9$

1.11.3

$\Delta time = 0.95\times 1.1 \times 0.333\times 10^{-9}\times 2.389\times 10^{12}\times 0.1 = 81.33$

1.11.4

$\Delta time = time\times 1.1 \times 1.05 - time = 122.9$

1.11.5

$\frac{9650}{750\times 1.1 \times 1.05} = 11.13$

1.11.6

$\frac{execution\ time\times clock \ rate}{instruction\ count} = \frac{700\times 4 \times 10^9}{2.389 \times 10^{12} \times(1-0.15)} = 1.38$

1.11.7

$\frac{execution\ time\times clock \ rate}{instruction\ count}$

所以如果其他条件不变的情况下, CPI与时钟频率成正比关系

1.11.8

$1-\frac{700}{750} = 0.06666666666666666666666666666666666667$

妈的, 写了这么多还没发疯, 我也挺6的

1.11.9

$instruction\ count = \frac{execution\ time \times clock\ rate}{CPI} = 2.147\times 10^9$

1.11.10

$\ rate = \frac{CPI \times instruction \ count}{execution\ time} = 3.333333GHz$

1.11.11

$\ rate = \frac{CPI \times instruction \ count}{execution\ time} = 3.18GHz$

12.

1.12 Section 1.10 cites as a pitfall the utilization of a subset of the performance equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions.

1.12.1 [5] <§§1.6, 1.10> One usual fallacy is to consider the computer with the largest clock rate as having the highest performance. Check if this is true for P1 and P2.

1.12.2 [10] <§§1.6, 1.10> Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions.

1.12.3 [10] <§§1.6, 1.10> A common fallacy is to use MIPS (millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2.

1.12.4 [10] <§1.10> Another common performance figure is MFLOPS (millions of floating-point operations per second), defined as
$\times 1E6)$
but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the processors.

啧, 写不动了, 我直接放答案吧

1.12.1

$1.125/0.25$

1.12.2

$9\times 10^8$

1.12.3

这个还是写一下

$MIPS\rightarrow millions\ instructio\ per\ second$

$MIPS_1 = 4.4444444444444\times 10^6$

$MIPS_2 = 4\times 10^6$

1.12.4

$MFLOP_1 = 4.4444444444444\times 10^6 \times 0.4 = 1.777777777777777777777\times 10^6$

$MFLOP_2 = 4\times 10^6 \times 0.4 = 1.6 \times 10^6$

13.

Another pitfall cited in Section 1.10 is expecting to improve the overall performance of a computer by improving only one aspect of the computer. Consider a computer running a program that requires 250 s, with 70 s spent executing FP instructions, 85 s executed L/S instructions, and 40 s spent executing branch instructions.

1.13.1 [5] <§1.10> By how much is the total time reduced if the time for FP operations is reduced by 20%?

1.13.2 [5] <§1.10> By how much is the time for INT operations reduced if the total time is reduced by 20%?

1.13.3 [5] <§1.10> Can the total time can be reduced by 20% by reducing only the time for branch instructions?

1.13.1

$\frac{70\times 0.2}{250} = 0.056$

1.13.2

$T_{int} = 250 - 85 - 40 - 70 = 55$

$\frac{250\times0.2}{55} = 90.9%$

1.13.3

$250\times0.2 = 50 > 40s$

所以就算把分支指令给爆了也减不了20%

但是你可以把我爆了

14.

Assume a program requires the execution of 50 ×106 FP instructions, 110 ×106 INT instructions, 80 ×106 L/S instructions, and 16 ×106 branch instructions. The CPI for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2 GHz clock rate.

1.14.1 [10] <§1.10> By how much must we improve the CPI of FP instructions if we want the program to run two times faster?

1.14.2 [10] <§1.10> By how much must we improve the CPI of L/S instructions if we want the program to run two times faster?

1.14.3 [5] <§1.10> By how much is the execution time of the program improved if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%?

1.14.1

算了, 直接算cycles占比吧

$\frac{50 ×10^6\times1}{50 ×10^6\times1 + 110 \times 10^6 \times 1 + 80 ×10^6 \times 4 + 16 ×10^6 \times 2}$

只占0.09…

所以就算把浮点指令给爆了也减不了50%

但是你可以把我爆了

1.14.2

$\frac{80 ×10^6 \times 4}{50 ×10^6\times1 + 110 \times 10^6 \times 1 + 80 ×10^6 \times 4 + 16 ×10^6 \times 2} = 0.625$

哦, 这个确实占这么多

所以直接变为原来的 $\frac{0.125}{0.625}=\frac{1}{5}$ 即可

那就是 $0.2\times4 = 0.8$

1.14.3

$\frac{50 ×10^6\times1 + 110 \times 10^6 \times 1 + 80 ×10^6 \times 4 + 16 ×10^6 \times 2}{0.6\times 50 ×10^6\times1 + 0.6\times 110 \times 10^6 \times 1 + 0.7\times 80 ×10^6 \times 4 + 0.7 \times 16 ×10^6 \times 2} = 1.4953$

15.

When a program is adapted to run on multiple processors in a multiprocessor system, the execution time on each processor is comprised of computing time and the overhead time required for locked critical sections and/or to send data from one processor to another. Assume a program requires t =100 s of execution time on one processor. When run p processors, each processor requires t/p s, as well as an additional 4 s of overhead, irrespective of the number of processors. Compute the per-processor execution time for 2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the corresponding speedup relative to a single processor and the ratio between actual speedup versus ideal speedup (speedup if there was no overhead).

$T_1 = 100s$