计算机系统结构 Computer Organization and Design RISC-V edition 第一章 习题

计算机系统结构 Computer Organization and Design RISC-V edition 第一章 习题

❗️❗️❗️观前声明❗️❗️❗️

我可不能保证这个玩意写出来是对的, 最好还是自己算算, 当然, 算错了欢迎指出

一. 英文书

1.

Aside from the smart cell phones used by a billion people, list and describe four other types of computers.

  1. 超级计算机
  2. 服务器
  3. 嵌入式计算机
  4. 个人电脑
  5. 云计算

2.

The eight great ideas in computer architecture are similar to ideas from other fields. Match the eight ideas from computer architecture, “Design for Moore’s Law,” “Use Abstraction to Simplify Design,” “Make the Common Case Fast,” “Performance via Parallelism,” “Performance via Pipelining,” “Performance via Prediction,” “Hierarchy of Memories,” and “Dependability via Redundancy” to the following ideas from other fields:
a. Assembly lines in automobile manufacturing
b. Suspension bridge cables
c. Aircraft and marine navigation systems that incorporate wind information
d. Express elevators in buildings
e. Library reserve desk
f. Increasing the gate area on a CMOS transistor to decrease its switching time
g. Adding electromagnetic aircraft catapults (which are electrically powered as opposed to current steam-powered models), allowed by the increased power generation offered by the new reactor technology
h. Building self-driving cars whose control systems partially rely on existing sensor systems already installed into the base vehicle, such as lane departure systems and smart cruise control systems

a. 流水线

b. 并行

c. 预测

d. 使常见情况快速

e. 分层的内存结构

f. 摩尔定律

g. 通过冗余增加可靠性

h. 使用抽象简化设计

3.

Describe the steps that transform a program written in a high-level language such as C into a representation that is directly executed by a computer processor.

C 源程序 ( . c ) ⟶ 预处理程序 预处理过的源程序 ( . i ) ⟶ 编译程序 汇编源程序 ( . s ) ⟶ 汇编程序 可重定位目标程序 ( . o ) ⟶ 链接程序 可执行目标程序 ( 二进制机器码文件 ) C源程序(.c)\stackrel{预处理程序}{\longrightarrow}预处理过的源程序(.i)\stackrel{编译程序}{\longrightarrow}汇编源程序(.s)\stackrel{汇编程序}{\longrightarrow}可重定位目标程序(.o)\stackrel{链接程序}{\longrightarrow}可执行目标程序(二进制机器码文件) C源程序(.c)预处理程序预处理过的源程序(.i)编译程序汇编源程序(.s)汇编程序可重定位目标程序(.o)链接程序可执行目标程序(二进制机器码文件)

4.

Assume a color display using 8 bits for each of the primary colors (red, green, blue) per pixel and a frame size of 1280 ×1024.

a. What is the minimum size in bytes of the frame buffer to store a frame?

b. How long would it take, at a minimum, for the frame to be sent over a 100 Mbit/s network?

假设一个彩色显示器每像素使用8位表示每种原色(红、绿、蓝),帧大小为1280 ×1024。

a.存储帧的最小帧缓冲区的字节数是多少?

b.在100mbit /s的网络上发送一个帧至少需要多长时间?

a. 8 b i t × 3 ( c o l o r s ) × 1280 × 1024 = 31457280 8bit \times 3(colors) \times 1280 \times 1024 = 31457280 8bit×3(colors)×1280×1024=31457280

b. 31457280 ➗ ( 100 × 2 20 ) = 0.3 31457280 ➗ (100 \times 2^{20}) = 0.3 31457280➗(100×220)=0.3

5.

Consider three different processors P1, P2, and P3 executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.

a. Which processor has the highest performance expressed in instructions per second?

b. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.

c. We are trying to reduce the execution time by 30%, but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction?

a.

I n s t r u c t i o n   p r e   s e c o n d = C l o c k   R a t e C P I → 2 / 2.5 / 1.81818 Instruction\ pre\ second = \frac{Clock\ Rate}{CPI} \rightarrow 2/ 2.5 / 1.81818 Instruction pre second=CPIClock Rate2/2.5/1.81818

P2 has the the highest performance expressed in instructions per second

b.

n u m b e r   o f   c y c l e s = t i m e × c l o c k   r a t e → 30 G / 25 G / 40 G number\ of\ cycles = time \times clock\ rate \rightarrow 30G/ 25G/ 40G number of cycles=time×clock rate30G/25G/40G

n u m b e r   o f   i n s t r u c t i o n s = n u m b e r o f c y c l e s C P I → 20 / 25 / 18.18181 number\ of\ instructions = \frac{number of cycles}{CPI} \rightarrow 20/ 25 / 18.18181 number of instructions=CPInumberofcycles20/25/18.18181

c.

t i m e   p e r   i n s t r u c t i o n = C P I C l o c k   R a t e → C l o c k   R a t e = C P I t i m e   p e r   I n s t r u c t i o n → → 1.2 0.7 → 1.7142857142857142857142857142857 time\ per\ instruction = \frac{CPI}{Clock\ Rate}\rightarrow Clock\ Rate = \frac{CPI}{time\ per\ Instruction} \rightarrow \rightarrow \frac{1.2}{0.7} \rightarrow 1.7142857142857142857142857142857 time per instruction=Clock RateCPIClock Rate=time per InstructionCPI→→0.71.21.7142857142857142857142857142857

5.14 / 4.28 / 6.85 5.14/4.28/6.85 5.14/4.28/6.85

6.

Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (classes A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.

Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which is faster: P1 or P2?

a. What is the global CPI for each implementation?

b. Find the clock cycles required in both cases.

a.

g l o b a l   C P I   P 1 = 0.1 × 1 + 0.2 × 2 + 0.7 × 3 = 2.6 global\ CPI\ P1 = 0.1 \times 1 + 0.2 \times 2 + 0.7 \times 3 = 2.6 global CPI P1=0.1×1+0.2×2+0.7×3=2.6

g l o b a l   C P I   P 2 = 2 global\ CPI\ P2 = 2 global CPI P2=2

b.

c l o c k   c y c l e s   P 1 = 2.6 × 1 0 6 clock\ cycles\ P1 = 2.6 \times 10^{6} clock cycles P1=2.6×106

c l o c k   c y c l e s   P 2 = 2 × 1 0 6 clock\ cycles\ P2 = 2 \times 10^{6} clock cycles P2=2×106

P2 is faster

7.

Compilers can have a profound impact on the performance of an application. Assume that for a program, compiler A results in a dynamic instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s.

a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns.

b. Assume the compiled programs run on two different processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A’s code versus the clock of the processor running compiler B’s code?

c. A new compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or B on the original processor?

a.

C P I = e x e c u t i o n   t i m e t i m e   p e r   c y c l e × i n s t r u c t i o n   c o u n t → 1.1 / 1.25 CPI = \frac{execution\ time}{time\ per\ cycle \times instruction\ count} \rightarrow 1.1 / 1.25 CPI=time per cycle×instruction countexecution time1.1/1.25

b

C l o c k   R a t e = i n s t r u c t i o n   c o u n t   × C P I e x e c u t i o n   t i m e = 1.1 1.25 × 1.2 = 0.7333333333 Clock\ Rate = \frac{instruction\ count \ \times CPI}{execution\ time} = \frac{1.1}{1.25 \times 1.2} = 0.7333333333 Clock Rate=execution timeinstruction count ×CPI=1.25×1.21.1=0.7333333333

c.

e x e c u t i o n   t i m e = i n s t r u c t i o n   c o u n t × C P I C l o c k   R a t e → 1.83333333 / 2.5 execution\ time = \frac{instruction\ count \times CPI}{Clock\ Rate}\rightarrow 1.83333333/ 2.5 execution time=Clock Rateinstruction count×CPI1.83333333/2.5

8.

The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6 GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of static power and 90 W of dynamic power. The Core i5 Ivy Bridge, released in 2012, has a clock rate of 3.4 GHz and voltage of 0.9 V. Assume that, on average, it consumed 30 W of static power and 40 W of dynamic power.

1.8.1 [5] <§1.7> For each processor find the average capacitive loads.

1.8.2 [5] <§1.7> Find the percentage of the total dissipated power comprised by static power and the ratio of static power to dynamic power for each technology.

1.8.3 [15] <§1.7> If the total dissipated power is to be reduced by 10%, how much should the voltage be reduced to maintain the same leakage current? Note: power is defined as the product of voltage and current.

a.

$dynamic\ power =\frac{1}{2} capacitive\ loads \times voltage^{2} \times frequency \rightarrow capacity\ load = \frac{2\times dynamic\ power}{voltage^2 \times frequency} $$1.6\times10^{-8}/ 1.452\times 10^{-8} $

b.

$total\ dissipated\ power\ comprised\ by\ static\ power = 10 % / 11.1 % $

$ ratio\ of\ static power\ to\ dynamic\ power\ for\ each\ technology = \frac{3}{7} / \frac{3}{4} $

c.

P o w e r n e w P o w e r o l d = P s t a t i c   n e w + P d y n a m i c   n e w P s t a t i c   o l d + P d y n a m i c   o l d = V × I + c a p a c i t i v e   l o a d × V 2 × f P s t a t i c   o l d + P d y n a m i c   o l d \frac{Power_{new}}{Power_{old}} = \frac{P_{static\ new} + P_{dynamic\ new}}{P_{static\ old} + P_{dynamic\ old}} = \frac{V\times I + capacitive\ load \times V^2 \times f}{P_{static\ old} + P_{dynamic\ old}} PoweroldPowernew=Pstatic old+Pdynamic oldPstatic new+Pdynamic new=Pstatic old+Pdynamic oldV×I+capacitive load×V2×f

I = P s t a t i c V → 不变 I = \frac{P_{static}}{V}\rightarrow 不变 I=VPstatic不变

0.85 V / 0.64 V 0.85V/0.64V 0.85V/0.64V

9.

Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency. Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 × p (where p is the number of processors) but the number of branch instructions per processor remains the same.

1.9.1 [5] <§1.7> Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processors result relative to the single processor result.

1.9.2 [10] <§§1.6, 1.8> If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4, or 8 processors?

1.9.3 [10] <§§1.6, 1.8> To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?

1.9.1

CPIIC
算数12.56 * 109
存取121.28 * 109
分支52.56 * 108

执行时间 = 指令数目 × C P I 时钟频率 执行时间 = \frac{指令数目 \times CPI}{时钟频率} 执行时间=时钟频率指令数目×CPI

执行时间相对单核加速
19.6\
27.041.36
43.842.5
82.244.29

1.9.2

CPIIC
算数22.56 * 109
存取121.28 * 109
分支52.56 * 108
执行时间
110.88
27.954286
44.297143
82.468571

1.9.3

时钟频率相等, 因此只需要总周期数一样即可

C P I 算数 × I C 算数 + C P I n e w 存取 × I C 存取 = C P I 算数 × I C 算数 + C P I 存取 × I C 存取 2.8 → C P I n e w 存取 = 3 CPI_{算数}\times IC_{算数} + CPI_{new存取}\times IC_{存取} = \frac{CPI_{算数}\times IC_{算数} + CPI_{存取}\times IC_{存取}}{2.8} \rightarrow CPI_{new存取} = 3 CPI算数×IC算数+CPInew存取×IC存取=2.8CPI算数×IC算数+CPI存取×IC存取CPInew存取=3

10.

Assume a 15 cm diameter wafer has a cost of 12, contains 84 dies, and has 0.020 defects/cm2 . Assume a 20 cm diameter wafer has a cost of 15, contains 100 dies, and has 0.031 defects/cm2 .

1.10.1 [10] <§1.5> Find the yield for both wafers.

1.10.2 [5] <§1.5> Find the cost per die for both wafers.

1.10.3 [5] <§1.5> If the number of dies per wafer is increased by 10% and the defects per area unit increases by 15%, find the die area and yield.

1.10.4 [5] <§1.5> Assume a fabrication process improves the yield from 0.92 to 0.95. Find the defects per area unit for each version of the technology given a die area of 200 mm2 .

1.10.1

晶圆片直径 : 15 c m → 晶圆片面积 = π r 2 = 176.625 c m 2 晶圆片直径: 15cm \rightarrow 晶圆片面积 = \pi r^2 = 176.625cm^2 晶圆片直径:15cm晶圆片面积=πr2=176.625cm2

晶圆片数量 : 84 晶圆片数量: 84 晶圆片数量:84

一片晶圆面积 : 176.625 84 = 2.103 c m 2 一片晶圆面积: \frac{176.625}{84} = 2.103 cm^2 一片晶圆面积:84176.625=2.103cm2

工艺良率 = 1 ( 1 + ( 0.02 × 2.103 2 ) ) 2 = 0.9794 工艺良率 = \frac{1}{(1 + (0.02 \times \frac{2.103}{2}))^2} = 0.9794 工艺良率=(1+(0.02×22.103))21=0.9794

晶圆片直径 : 20 c m → 晶圆片面积 = π r 2 = 314 c m 2 晶圆片直径: 20cm \rightarrow 晶圆片面积 = \pi r^2 = 314cm^2 晶圆片直径:20cm晶圆片面积=πr2=314cm2

晶圆片数量 : 100 晶圆片数量: 100 晶圆片数量:100

一片晶圆面积 : 314 100 = 3.14 c m 2 一片晶圆面积: \frac{314}{100} = 3.14 cm^2 一片晶圆面积:100314=3.14cm2

工艺良率 = 1 ( 1 + ( 0.031 × 3.14 2 ) ) 2 = 0.9535 工艺良率 = \frac{1}{(1 + (0.031 \times \frac{3.14}{2}))^2} = 0.9535 工艺良率=(1+(0.031×23.14))21=0.9535

1.10.2

妈的再写我要die了

C o s t   p e r   d i e = c o s t   p e r   w a f e r D i e s   p e r   w a f e r × y i e l d = 0.146 / 0.157 Cost\ per\ die = \frac{cost \ per \ wafer}{Dies\ per\ wafer \times yield} = 0.146/ 0.157 Cost per die=Dies per wafer×yieldcost per wafer=0.146/0.157

1.10.3

d i e   a r e a 1 = 2.103 1.1 = 1.9118 die\ area1 = \frac{2.103}{1.1} = 1.9118 die area1=1.12.103=1.9118

y i e l d 1 = 1 ( 1 + 0.02 × 1.15 × 1.9118 2 ) 2 = 0.9574 yield1 = \frac{1}{(1+0.02\times \frac{1.15 \times 1.9118}{2})^2} = 0.9574 yield1=(1+0.02×21.15×1.9118)21=0.9574

d i e   a r e a 2 = 3.14 1.1 = 2.855 die\ area2 = \frac{3.14}{1.1} = 2.855 die area2=1.13.14=2.855

y i e l d 2 = 1 ( 1 + 0.031 × 1.15 × 2.855 2 ) 2 = 0.9055 yield2 = \frac{1}{(1+0.031\times \frac{1.15 \times 2.855}{2})^2} = 0.9055 yield2=(1+0.031×21.15×2.855)21=0.9055

1.10.4

0.92 = 1 ( 1 + d e f e c t s   p e r   a r e a × 2 2 ) 2 → d e f e c t s   p e r   a r e a = 0.043 0.92 = \frac{1}{(1 + defects\ per\ area \times \frac{2}{2} )^2}\rightarrow defects\ per\ area = 0.043 0.92=(1+defects per area×22)21defects per area=0.043

11.

The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.

1.11.1 [5] <§§1.6, 1.9> Find the CPI if the clock cycle time is 0.333 ns.

1.11.2 [5] <§1.9> Find the SPECratio.

1.11.3 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% without affecting the CPI.

1.11.4 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.

1.11.5 [5] <§§1.6, 1.9> Find the change in the SPECratio for this change.

1.11.6 [10] <§1.6> Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. The execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.

1.11.7 [10] <§1.6> This CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?

1.11.8 [5] <§1.6> By how much has the CPU time been reduced?

1.11.9 [10] <§1.6> For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without affecting the CPI and with a clock rate of 4 GHz, determine the number of instructions.

1.11.10 [10] <§1.6> Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.

1.11.11 [10] <§1.6> Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.

1.11.1

C P I = e x e c u t i o n   t i m e i n s t r u c t i o n   c o u n t × t i m e s   p e r   c y c l e = 0.95 CPI = \frac{execution\ time}{instruction\ count \times times\ per\ cycle} = 0.95 CPI=instruction count×times per cycleexecution time=0.95

1.11.2

S P E C = 参考处理时间 实际处理时间 = 9650 750 = 12.9 SPEC =\frac{参考处理时间}{实际处理时间}= \frac{9650}{750} = 12.9 SPEC=实际处理时间参考处理时间=7509650=12.9

1.11.3

Δ t i m e = 0.95 × 1.1 × 0.333 × 1 0 − 9 × 2.389 × 1 0 12 × 0.1 = 81.33 \Delta time = 0.95\times 1.1 \times 0.333\times 10^{-9}\times 2.389\times 10^{12}\times 0.1 = 81.33 Δtime=0.95×1.1×0.333×109×2.389×1012×0.1=81.33

1.11.4

Δ t i m e = t i m e × 1.1 × 1.05 − t i m e = 122.9 \Delta time = time\times 1.1 \times 1.05 - time = 122.9 Δtime=time×1.1×1.05time=122.9

1.11.5

S P E C = 9650 750 × 1.1 × 1.05 = 11.13 SPEC = \frac{9650}{750\times 1.1 \times 1.05} = 11.13 SPEC=750×1.1×1.059650=11.13

1.11.6

C P I = e x e c u t i o n   t i m e × c l o c k   r a t e i n s t r u c t i o n   c o u n t = 700 × 4 × 1 0 9 2.389 × 1 0 12 × ( 1 − 0.15 ) = 1.38 CPI = \frac{execution\ time\times clock \ rate}{instruction\ count} = \frac{700\times 4 \times 10^9}{2.389 \times 10^{12} \times(1-0.15)} = 1.38 CPI=instruction countexecution time×clock rate=2.389×1012×(10.15)700×4×109=1.38

1.11.7

C P I = e x e c u t i o n   t i m e × c l o c k   r a t e i n s t r u c t i o n   c o u n t CPI = \frac{execution\ time\times clock \ rate}{instruction\ count} CPI=instruction countexecution time×clock rate

所以如果其他条件不变的情况下, CPI与时钟频率成正比关系

1.11.8

1 − 700 750 = 0.06666666666666666666666666666666666667 1-\frac{700}{750} = 0.06666666666666666666666666666666666667 1750700=0.06666666666666666666666666666666666667

妈的, 写了这么多还没发疯, 我也挺6的

1.11.9

i n s t r u c t i o n   c o u n t = e x e c u t i o n   t i m e × c l o c k   r a t e C P I = 2.147 × 1 0 9 instruction\ count = \frac{execution\ time \times clock\ rate}{CPI} = 2.147\times 10^9 instruction count=CPIexecution time×clock rate=2.147×109

1.11.10

c l o c k   r a t e = C P I × i n s t r u c t i o n   c o u n t e x e c u t i o n   t i m e = 3.333333 G H z clock \ rate = \frac{CPI \times instruction \ count}{execution\ time} = 3.333333GHz clock rate=execution timeCPI×instruction count=3.333333GHz

1.11.11

c l o c k   r a t e = C P I × i n s t r u c t i o n   c o u n t e x e c u t i o n   t i m e = 3.18 G H z clock \ rate = \frac{CPI \times instruction \ count}{execution\ time} = 3.18GHz clock rate=execution timeCPI×instruction count=3.18GHz

12.

1.12 Section 1.10 cites as a pitfall the utilization of a subset of the performance equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions.

1.12.1 [5] <§§1.6, 1.10> One usual fallacy is to consider the computer with the largest clock rate as having the highest performance. Check if this is true for P1 and P2.

1.12.2 [10] <§§1.6, 1.10> Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions.

1.12.3 [10] <§§1.6, 1.10> A common fallacy is to use MIPS (millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2.

1.12.4 [10] <§1.10> Another common performance figure is MFLOPS (millions of floating-point operations per second), defined as
M F L O P S = N o . F P o p e r a t i o n s / ( e x e c u t i o n t i m e × 1 E 6 ) MFLOPS = No.FPoperations/(executiontime \times 1E6) MFLOPS=No.FPoperations/(executiontime×1E6)
but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the processors.

啧, 写不动了, 我直接放答案吧

1.12.1

1.125 / 0.25 1.125/0.25 1.125/0.25

1.12.2

9 × 1 0 8 9\times 10^8 9×108

1.12.3

这个还是写一下

M I P S → m i l l i o n s   i n s t r u c t i o   p e r   s e c o n d MIPS\rightarrow millions\ instructio\ per\ second MIPSmillions instructio per second

M I P S 1 = 4.4444444444444 × 1 0 6 MIPS_1 = 4.4444444444444\times 10^6 MIPS1=4.4444444444444×106

M I P S 2 = 4 × 1 0 6 MIPS_2 = 4\times 10^6 MIPS2=4×106

1.12.4

M F L O P 1 = 4.4444444444444 × 1 0 6 × 0.4 = 1.777777777777777777777 × 1 0 6 MFLOP_1 = 4.4444444444444\times 10^6 \times 0.4 = 1.777777777777777777777\times 10^6 MFLOP1=4.4444444444444×106×0.4=1.777777777777777777777×106

M F L O P 2 = 4 × 1 0 6 × 0.4 = 1.6 × 1 0 6 MFLOP_2 = 4\times 10^6 \times 0.4 = 1.6 \times 10^6 MFLOP2=4×106×0.4=1.6×106

13.

Another pitfall cited in Section 1.10 is expecting to improve the overall performance of a computer by improving only one aspect of the computer. Consider a computer running a program that requires 250 s, with 70 s spent executing FP instructions, 85 s executed L/S instructions, and 40 s spent executing branch instructions.

1.13.1 [5] <§1.10> By how much is the total time reduced if the time for FP operations is reduced by 20%?

1.13.2 [5] <§1.10> By how much is the time for INT operations reduced if the total time is reduced by 20%?

1.13.3 [5] <§1.10> Can the total time can be reduced by 20% by reducing only the time for branch instructions?

1.13.1

70 × 0.2 250 = 0.056 \frac{70\times 0.2}{250} = 0.056 25070×0.2=0.056

1.13.2

T i n t = 250 − 85 − 40 − 70 = 55 T_{int} = 250 - 85 - 40 - 70 = 55 Tint=250854070=55

250 × 0.2 55 = 90.9 \frac{250\times0.2}{55} = 90.9% 55250×0.2=90.9

1.13.3

250 × 0.2 = 50 > 40 s 250\times0.2 = 50 > 40s 250×0.2=50>40s

所以就算把分支指令给爆了也减不了20%

但是你可以把我爆了

14.

Assume a program requires the execution of 50 ×106 FP instructions, 110 ×106 INT instructions, 80 ×106 L/S instructions, and 16 ×106 branch instructions. The CPI for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2 GHz clock rate.

1.14.1 [10] <§1.10> By how much must we improve the CPI of FP instructions if we want the program to run two times faster?

1.14.2 [10] <§1.10> By how much must we improve the CPI of L/S instructions if we want the program to run two times faster?

1.14.3 [5] <§1.10> By how much is the execution time of the program improved if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%?

1.14.1

算了, 直接算cycles占比吧

50 × 1 0 6 × 1 50 × 1 0 6 × 1 + 110 × 1 0 6 × 1 + 80 × 1 0 6 × 4 + 16 × 1 0 6 × 2 \frac{50 ×10^6\times1}{50 ×10^6\times1 + 110 \times 10^6 \times 1 + 80 ×10^6 \times 4 + 16 ×10^6 \times 2} 50×106×1+110×106×1+80×106×4+16×106×250×106×1

只占0.09…

所以就算把浮点指令给爆了也减不了50%

但是你可以把我爆了

1.14.2

80 × 1 0 6 × 4 50 × 1 0 6 × 1 + 110 × 1 0 6 × 1 + 80 × 1 0 6 × 4 + 16 × 1 0 6 × 2 = 0.625 \frac{80 ×10^6 \times 4}{50 ×10^6\times1 + 110 \times 10^6 \times 1 + 80 ×10^6 \times 4 + 16 ×10^6 \times 2} = 0.625 50×106×1+110×106×1+80×106×4+16×106×280×106×4=0.625

哦, 这个确实占这么多

所以直接变为原来的 0.125 0.625 = 1 5 \frac{0.125}{0.625}=\frac{1}{5} 0.6250.125=51即可

那就是 0.2 × 4 = 0.8 0.2\times4 = 0.8 0.2×4=0.8

1.14.3

50 × 1 0 6 × 1 + 110 × 1 0 6 × 1 + 80 × 1 0 6 × 4 + 16 × 1 0 6 × 2 0.6 × 50 × 1 0 6 × 1 + 0.6 × 110 × 1 0 6 × 1 + 0.7 × 80 × 1 0 6 × 4 + 0.7 × 16 × 1 0 6 × 2 = 1.4953 \frac{50 ×10^6\times1 + 110 \times 10^6 \times 1 + 80 ×10^6 \times 4 + 16 ×10^6 \times 2}{0.6\times 50 ×10^6\times1 + 0.6\times 110 \times 10^6 \times 1 + 0.7\times 80 ×10^6 \times 4 + 0.7 \times 16 ×10^6 \times 2} = 1.4953 0.6×50×106×1+0.6×110×106×1+0.7×80×106×4+0.7×16×106×250×106×1+110×106×1+80×106×4+16×106×2=1.4953

15.

When a program is adapted to run on multiple processors in a multiprocessor system, the execution time on each processor is comprised of computing time and the overhead time required for locked critical sections and/or to send data from one processor to another. Assume a program requires t =100 s of execution time on one processor. When run p processors, each processor requires t/p s, as well as an additional 4 s of overhead, irrespective of the number of processors. Compute the per-processor execution time for 2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the corresponding speedup relative to a single processor and the ratio between actual speedup versus ideal speedup (speedup if there was no overhead).

T 1 = 100 s T_1 = 100s T1=100s

T n = 100 n + 4 T_n = \frac{100}{n} + 4 Tn=n100+4

核数执行时间加速比理想加速比实际加速比
110011
2541.850.925
4293.440.86
816.56.070.75875
1610.259.760.61
327.1314.020.438125
645.5617.980.2809375
1284.7820.920.1634375

2. 课后作业

实际上课后作业基本上来自于上面的内容, 善用ctrl + F (*^_^*)


Final

好! 写完了, 很有精神!

爆了全世界!

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
### 回答1: 《计算机组成与设计:RISC-V版本,硬件与软件的互动》是一本关于计算机体系结构的教材。它涵盖了计算机硬件和软件之间的相互作用。 首先,这本教材深入介绍了计算机组成的基本概念和原理。它讲解了计算机硬件的各个组成部分,如中央处理器、存储器、输入输出设备等,并详细解释了它们之间的工作原理和互动方式。读者可以通过这些内容全面了解计算机硬件的工作方式。 此外,该教材还重点介绍了RISC-V指令集架构RISC-V是一种现代的、开放的指令集架构,具有简洁、规范和可定制的特点。本书详细描述了RISC-V指令集的设计和实现,并解释了它与计算机硬件的紧密关系。读者可以通过学习RISC-V指令集,了解指令的执行过程,理解计算机在硬件层面上如何处理指令和数据。 在硬件和软件的交互方面,这本教材强调了它们之间的密切联系。它介绍了硬件和软件之间的界面和通信方式,如总线、中断和输入输出等。通过学习这些内容,读者将了解到计算机硬件和软件是如何相互配合工作的。它还讨论了如何进行硬件和软件的调试和优化,以提高计算机的性能和可靠性。 总的来说,《计算机组成与设计:RISC-V版本,硬件与软件的互动》这本书从计算机硬件和软件的角度全面介绍了计算机的组成和互动方式。通过学习这本教材,读者可以深入了解计算机系统的工作原理,并掌握如何设计和优化计算机系统的能力。 ### 回答2: 《计算机组织与设计:RISC-V版》是一本关于计算机硬件和软件互联的重要教材。这本书的主要内容包括计算机组织与结构、指令级并行、存储系统、互连技术、输入输出系统等。该书以RISC-V指令集架构为基础,详细介绍了计算机的硬件结构和设计原理,并与软件编程环境相结合。这种硬件软件相互补充的设计使得计算机能够高效、稳定地运行。 该书的特点之一是使用清晰的语言和具体的实例解释计算机硬件和软件之间的关系。通过逐步引入不同的主题和概念,读者可以深入了解计算机硬件组成的基本原理,并了解它们与软件编程之间的互动关系。此外,书中提供了大量的实践案例和练习题,使读者能够巩固所学的知识,并自主进行实践和思考。 在讲解硬件设计方面,该书详细讨论了计算机的基本组成部分,如处理器、存储器、输入输出设备以及互连技术等。它深入探讨了各个组件的工作原理和设计方法,包括流水线、缓存、并发控制等。此外,该书还介绍了指令级并行的相关技术,如流水线、超标量、动态调度等,使读者能够了解如何通过优化硬件设计来提高计算机的性能。 在软件编程方面,该书介绍了RISC-V指令集的特点和使用方法。它详细讲解了指令的结构和功能,以及如何使用汇编语言进行编程。此外,该书还介绍了操作系统、编译器和调试工具等软件开发环境的基本原理和使用方法,使读者能够理解软件和硬件之间的交互关系,以及如何进行有效的软件开发。 总之,《计算机组织与设计:RISC-V版》通过深入浅出的讲解和大量实例的引导,将计算机硬件和软件的复杂性简化为易于理解和学习的内容。它为读者提供了全面而深入的知识,使他们能够了解计算机系统的工作原理、优化硬件设计和进行高效软件编程。这本书是学习计算机组成与设计的重要参考资料,对于想要深入了解计算机硬件和软件的读者来说是一本不可或缺的教材。 ### 回答3: 《计算机组织与设计RISC-V版:硬件软件接口》介绍了计算机硬件和软件的互动关系。它涵盖了计算机系统中硬件和软件之间的接口,以及它们是如何相互作用的。 该书首先介绍了计算机体系结构的基本知识,包括指令集架构、计算机组成和设计原则等。接着,它深入探讨了RISC-V架构,该架构是一种现代的指令集架构,被广泛用于教育和学术研究。 书中还详细讨论了硬件和软件之间的接口,包括指令集、寄存器、内存和输入输出等。通过深入的解释和实例演示,读者可以了解硬件和软件之间的通信和协作方式。 此外,该书还介绍了一些高级主题,如流水线和并行处理。这些主题涉及到优化计算机性能的技术和策略,使读者能够更好地理解复杂的计算机系统结构。 《计算机组织与设计RISC-V版:硬件软件接口》适用于计算机科学、计算机工程和相关专业的学生。它是一本全面介绍计算机系统结构和设计原理的权威教材,旨在帮助读者深入理解计算机硬件和软件之间的互动关系。读者可以通过阅读本书,获得一种全面的计算机系统知识,为日后的学习和工作打下坚实的基础。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值