处理器结构与存储器层次结构——习题

Question_1

*Question 1
Suppose we analyze the combinational logic of Figure and determine that it can beseparated into a sequence of six blocks, named A to F, having delays of 80, 30, 60, 50,70, and 10 ps, respectively, illustrated as follows:
在这里插入图片描述

We can create pipelined versions of this design by inserting pipeline registers between pairs of these blocks. Different combinations of pipeline depth (how many stages) and maximum throughput arise, depending on where we insert the pipeline registers.*
Assume that a pipeline register has a delay of 20 ps.

问题
A. Inserting a single register gives a two-stage pipeline. Where should the register be inserted to maximize throughput?What would be the throughput and latency?
可以看到这题是问我们只插入一个流水线寄存器的话,插在那个位置可以让吞吐量最大化,延迟是多少
答案:将ABC划分在一起,DEF划分在一起,即在ABC后插入寄存器。
ABC+寄存器的延迟为190ps
DEF+寄存器的延迟为150ps
所以每个阶段的延迟应该为二者中较大者,190ps

所以周期为190ps,吞吐量为5.26GIPS,延迟为380ps

B. Where should two registers be inserted to maximize the throughput of a threestage pipeline? What would be the throughput and latency?
这道题改为插入两个寄存器,其他与A相同
答案:就是要让这三部分尽量均等
所以我们在AB后插入寄存器,CD后插入寄存器
AB+寄存器延迟=130ps, CD+寄存器延迟=130ps
EF+寄存器延迟=100ps

所以周期为130ps,吞吐量为7.69GIPS,延迟为390ps

C. Where should three registers be inserted to maximize the throughput of a 4-stage pipeline? What would be the throughput and latency?
插入三个寄存器
在A后插入寄存器,在C后插入寄存器,在D后插入寄存器
A+寄存器延迟=100ps,BC+寄存器延迟=110ps,D+寄存器延迟=70ps
EF+寄存器延迟=100ps

所以周期为110ps,吞吐量为9.09GIPS,延迟为440ps

D. What is the minimum number of stages that would yield a design with the maximum achievable throughput? Describe this design,its throughput and its latency
求如何用最少的寄存器产生最大的吞吐量
应该插入四个寄存器
因为A阶段为80ps, 所以周期最小也要为100ps, 因此各个部分都要向100靠拢
且不超过100ps

所以在ABCD各自后面插入一个寄存器,剩余的EF和寄存器为一个整体
可知五个阶段的延迟分别为:
100ps、50ps、80ps、70ps、100ps

所以周期为100ps、吞吐量为10GIPS、延迟为500ps
即使再加寄存器,也无法使周期变短,只会增加延迟


Question_2

Suppose we could take the system of Figure and divide it into an arbitrary number of pipeline stages k, each having a delay of 300/k, and with each pipeline register having a delay of 20 ps
在这里插入图片描述问题
A. What would be the latency and the throughput of the system,as functions of k?
就是问如果分成了K个部分,求总吞吐量和延迟
可知插入了k-1个寄存器,总延迟为(300+20k)ps
吞吐量为1000
k/(300+20*k)GIPS

B. What would be the ultimate limit on the throughput?
当k趋于无穷大时,寄存器延迟成为主要因素,所以周期为20ps
吞吐量为1000/20=50GIPS


Question_3

Question 3
What is the capacity of a disk with 2 platters, 10,000 cylinders(tracks), an average of 400 sectors per track, and 512 bytes per sector?

capacity
=2 * 2 * 10,000 * 400 * 512 (bytes)
= 8,192,000,000 bytes
=8.192GB <==> 7.629GIB


Question_4

Estimate the average time (in ms) to access a sector on the following disk:
在这里插入图片描述
由于旋转速率为15000转每分钟
所以平均旋转时间 = 0.5 * (60/15000) = 2ms

平均寻道时间 = 8ms
平均传送时间 = 4/500 ms = 0.008ms
所以读取一个扇区所需要的平均时间约为 10 ms


Question_5

Suppose that a 1 MB file consisting of 512-byte logical blocks is stored on a disk drive with the following characteristics:
在这里插入图片描述
For each case below, suppose that a program reads the logical blocks of the file sequentially, one after the other, and that the time to position the head over the first block is Tavg seek + Tavg rotation.

首先可以得知这1MB的文件是由2000个512 bytes 的扇区存储的

A. Best case: Estimate the optimal time (in ms) required to read the file given the best possible mapping of logical blocks to disk sectors (i.e., sequential).
这是最好的情况
即文件的2000个扇区连续分布在两个相邻磁道上
所以时间
= 平均旋转时间 + 平均寻道时间 + 2000个扇区的传送时间
=30/10 + 5 +(60/10) * 2 ms
=20 (ms)

所以最佳情况为20 ms

B. Random case: Estimate the time (in ms) required to read the file if blocks are mapped randomly to disk sectors.
这是随机的情况
即读取每一个扇区所需时间
=平均旋转时间 + 平均寻道时间 + 一个扇区的传送时间
所以2000个扇区所需要的总时间
=2000 * (3+5) +12 ms
=16012 (ms)
≈16 (s)


Question_6

The three functions in Figure perform the same operation with varying degrees of spatial locality. Rank-order the functions with respect to the spatial locality enjoyed by each. Explain how you arrived at your ranking
在这里插入图片描述
空间局部性由优到差依次为:clear1、clear2、clear3
clear1 每次都以步长为1访问每个数组;
而clear2虽然也是以步长为1访问p数组,却在p数组内访问完vel数组后跳到acc数组进行访问,因此空间局部性比clear1差

而clear3访问数组的步长不为1,且在p、vel、acc之间互相跳转,因此空间局部性比clear2差
所以clear1优于clear2优于clear3


Question_7

Imagine a hypothetical cache of the form (S, E, B, m) = (512, 1, 32, 32) (m is address width) that uses the high-order s bits of an address as the set index. For such a cache, contiguous chunks of memory blocks are mapped to the same cache set.

A. How many blocks are in each of these contiguous array chunks?
设标记位数为t,则每个连续的数组片中有2^t个块
在此题中,t=32-9-5=18
所以每个连续的数组片中有2^18个块

B. Consider the following code that runs on this system. What is the maximum number of array blocks that are stored in the cache at any point in time?
在这里插入图片描述可知地址中组索引为9位,块偏移为5位,所以标记位为18位
所以数组中前2^18个块都会进入第0组,每个块为32bytes
而我们的数组只有4096*4/32 = 512个块 < 2^18个块
所以我们的数组元素只能映射到一组里
所以高速缓存中最多只能保存一个数组块


Question_8

Assume the following:
• The memory is byte addressable.
• Memory accesses are to 1-byte words (not to 4-byte words).
• Addresses are 13 bits wide.
• The cache is two-way set associative (E = 2), with a 4-byte block size (B = 4) and
eight sets (S = 8).

在这里插入图片描述
The following figure shows the format of an address (1 bit per box). Indicate (by labeling the diagram) the fields that would be used to determine the following:
CO. The cache block offset
CI. The cache set index
CT. The cache tag

在这里插入图片描述
**答案
CT CT CT CT CT CT CT CT CI CI CI CO CO **
(8个CT,3个CI,2个CO)

**Suppose a program running on the machine references the 1-byte word at address 0x0E34. Indicate the cache entry accessed and the cache byte value returned in hexadecimal notation. Indicate whether a cache miss occurs. If there is a cache miss, enter “—” for “Cache byte returned.”
**
在这里插入图片描述答案
其地址格式为 0 1 1 1 0 0 0 1 1 0 1 0 0
上图表格中答案依次为
0x0
0x5
0x71

0x0B

  • 3
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值