Computer composition and design work07 ——fifth verson

5.1

In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a 32-bit integer

for (I=0; I<8; I++)
    for (J=0; J<8000; J++)
        A[I][J]=B[I][0]+A[J][I];

5.1.1 [5] <§5.1> How many 32-bit integers can be stored in a 16-byte cache block?

1   b y t e = 8   b i t 1\ byte = 8 \ bit 1 byte=8 bit
16 × 8 / 32 = 4 16\times 8 / 32 = 4 16×8/32=4

5.1.2 [5] <§5.1> References to which variables exhibit temporal locality?

访问 I J I\quad J IJ以及 B [ I ] [ 0 ] B[I][0] B[I][0]会产生时间局限性(在循环中被再次访问)

5.1.3 [5] <§5.1> References to which variables exhibit spatial locality?

A [ I ] [ J ] A[I][J] A[I][J]会产生空间局限性(在循环中会迅速被访问下一个位置)
A [ J ] [ I ] A[J][I] A[J][I]访问距离较远,所以不认为有空间局限性

Locality is aff ected by both the reference order and data layout. Th e same computation can also be written below in Matlab, which diff ers from C by storing matrix elements within the same column contiguously in memory.

for I=1:8
    for J=1:8000
        A(I,J)=B(I,0)+A(J,I);
    end
end

5.1.4 [10] <§5.1> How many 16-byte cache blocks are needed to store all 32-bit matrix elements being referenced?

32位矩阵元素共 8 × 800 = 6400 8\times 800=6400 8×800=6400
根据5.1.1,一个16字节cache可以储存4个
一共需要
6400 / 4 = 1600 6400/4=1600 6400/4=1600

5.1.5 [5] <§5.1> References to which variables exhibit temporal locality?

访问 I J I\quad J IJ以及 B ( I , 0 ) B(I,0) B(I,0)会产生时间局限性(在循环中被再次访问)

5.1.6 [5] <§5.1> References to which variables exhibit spatial locality

A ( I , J ) A(I,J) A(I,J)会产生空间局限性(在循环中会迅速被访问下一个位置)

5.2

Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 32-bit memory address references, given as word addresses.

3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253

5.2.1 [10] <§5.3> For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.

一开始都是空的

  1. cache大小为 16 = 2 4 16 = 2^4 16=24,索引字段 n = 4 n = 4 n=4
    不难得到index应为4位二进制数

  2. 数据块大小为 1 = 2 0 1 = 2^0 1=20个单字, m = 0 m=0 m=0
    所以剩余的4位完全用于tag

字地址二进制地址标签索引命中或失效
30000 0011 000 0 ( 2 ) 0000_{(2)} 0000(2) = 0 001 1 ( 2 ) 0011_{(2)} 0011(2) = 3Miss
1801011 0100 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 001 1 ( 2 ) 0011_{(2)} 0011(2) = 4Miss
430010 1011 001 0 ( 2 ) 0010_{(2)} 0010(2) = 2 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11Miss
20000 0010 000 0 ( 2 ) 0000_{(2)} 0000(2) = 0 001 0 ( 2 ) 0010_{(2)} 0010(2) = 2Miss
1911011 1111 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 111 1 ( 2 ) 1111_{(2)} 1111(2) = 15Miss
880101 1000 010 1 ( 2 ) 0101_{(2)} 0101(2) = 5 100 0 ( 2 ) 1000_{(2)} 1000(2) = 8Miss
1901011 1110 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 111 0 ( 2 ) 1110_{(2)} 1110(2) = 14$Miss
140000 1111 000 0 ( 2 ) 0000_{(2)} 0000(2) = 0 111 0 ( 2 ) 1110_{(2)} 1110(2) = 14Miss
1811011 0101 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 010 1 ( 2 ) 0101_{(2)} 0101(2) = 5Miss
440010 1100 001 0 ( 2 ) 0010_{(2)} 0010(2) = 2 110 0 ( 2 ) 1100_{(2)} 1100(2) = 12Miss
1861011 0101 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 110 0 ( 2 ) 1100_{(2)} 1100(2) = 10Miss
2531111 1101 111 1 ( 2 ) 1111_{(2)} 1111(2) = 15 110 1 ( 2 ) 1101_{(2)} 1101(2) = 13Miss

5.2.2 [10] <§5.3> For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.

  1. cache大小为 8 = 2 3 8 = 2^3 8=23,索引字段 n = 3 n = 3 n=3
    不难得到index应为3位二进制数

  2. 数据块大小为 2 = 2 1 2 = 2^1 2=21个单字, m = 1 m=1 m=1
    所以剩余的4位用于tag

字地址二进制地址标签索引命中或失效
30000 0011 000 0 ( 2 ) 0000_{(2)} 0000(2) = 0 00 1 ( 2 ) 001_{(2)} 001(2) = 1Miss
1801011 0100 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 00 1 ( 2 ) 001_{(2)} 001(2) = 2Miss
430010 1011 001 0 ( 2 ) 0010_{(2)} 0010(2) = 2 10 1 ( 2 ) 101_{(2)} 101(2) = 5Miss
20000 0010 000 0 ( 2 ) 0000_{(2)} 0000(2) = 0 00 1 ( 2 ) 001_{(2)} 001(2) = 1Hit(第一行)
1911011 1111 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 11 1 ( 2 ) 111_{(2)} 111(2) = 7Miss
880101 1000 010 1 ( 2 ) 0101_{(2)} 0101(2) = 5 10 0 ( 2 ) 100_{(2)} 100(2) = 4Miss
1901011 1110 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 11 1 ( 2 ) 111_{(2)} 111(2) = 7Hit(第五行)
140000 1111 000 0 ( 2 ) 0000_{(2)} 0000(2) = 0 11 1 ( 2 ) 111_{(2)} 111(2) = 7Miss
1811011 0101 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 01 0 ( 2 ) 010_{(2)} 010(2) = 2Hit(第二行)
440010 1100 001 0 ( 2 ) 0010_{(2)} 0010(2) = 2 11 0 ( 2 ) 110_{(2)} 110(2) = 6Miss
1861011 0101 101 1 ( 2 ) 1011_{(2)} 1011(2) = 11 11 0 ( 2 ) 110_{(2)} 110(2) = 5Miss
2531111 1101 111 1 ( 2 ) 1111_{(2)} 1111(2) = 15 11 0 ( 2 ) 110_{(2)} 110(2) = 7$Miss

5.2.3 [20] <§§5.3, 5.4> You are asked to optimize a cache design for the given references. Th ere are three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has 1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks. In terms of miss rate, which cache design is the best? If the miss stall time is 25 cycles, and C1 has an access time of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles, which is the best cache design

C1块大小1

  1. cache大小为 32 = 2 5 32 = 2^5 32=25,索引字段$n = 5$
  2. 块大小数据块大小为 1 = 2 0 1 = 2^0 1=20个单字, m = 0 m=0 m=0

省略二进制转换过程

字地址二进制地址标签索引命中或失效
300000 01103Miss
18010110 100224Miss
4300101 01153Miss
200000 01002Miss
19110111 111237Miss
8801011 000110Miss
19010111 110236Miss
1400001 11116Miss
18110110 101225Miss
4400101 10054Miss
18610110 101232Miss
25311111 101315Miss

失效率百分之百
阻塞时间 12 × 25 + 12 × 2 = 324 阻塞时间 12\times 25+ 12\times 2 = 324 阻塞时间12×25+12×2=324

C2块大小2

块大小数据块大小为 2 = 2 1 2 = 2^1 2=21个单字, m = 1 m=1 m=1

字地址二进制地址标签索引命中或失效
300000 01 101Miss
18010110 10 0222Miss
4300101 01 151Miss
200000 01 001Hit
19110111 11 1233Miss
8801011 00 0110Miss
19010111 11 0233Hit
1400001 11 113Miss
18110110 10 1222Miss
4400101 10 052Miss
18610110 10 1231Miss
25311111 10 1312Miss

失效率  10 / 12 = 83.33 % 失效率 \ 10/12 = 83.33\% 失效率 10/12=83.33%
阻塞时间 10 × 25 + 12 × 3 = 286 阻塞时间 10\times 25+ 12\times 3 = 286 阻塞时间10×25+12×3=286

C2块大小2
块大小数据块大小为 4 = 2 2 4 = 2^2 4=22个单字, m = 2 m=2 m=2

字地址二进制地址标签索引命中或失效
300000 0 1100Miss
18010110 1 00221Miss
4300101 0 1150Miss
200000 0 1000Miss
19110111 1 11231Miss
8801011 0 00110Miss
19010111 1 10231Hit
1400001 1 1111Miss
18110110 1 01221Miss
4400101 1 0051Miss
18610110 1 01230Miss
25311111 1 01311Miss
失效率  11 / 12 = 91.67 % 失效率 \ 11/12 = 91.67\% 失效率 11/12=91.67%
阻塞时间 11 × 25 + 12 × 5 = 335 阻塞时间 11\times 25+ 12\times 5 = 335 阻塞时间11×25+12×5=335

Th ere are many diff erent design parameters that are important to a cache’s overall performance. Below are listed parameters for diff erent direct-mapped cache designs.

Cache Data Size: 32 KiB
Cache Block Size: 2 words
Cache Access Time: 1 cycle

5.2.4 [15] <§5.3> Calculate the total number of bits required for the cache listed above, assuming a 32-bit address. Given that total size, fi nd the total size of the closest direct-mapped cache with 16-word blocks of equal size or greater. Explain why the second cache, despite its larger data size, might provide slower performance than the fi rst cache.

知识补充
在这里插入图片描述
KiB单位大小指的是(字节 type),下面cache问的是位(bit)大小

单位B指的是字节,单位b才是位

所以算bits的公式是
2 n × [ 1 + ( 32 − n − m − 2 ) + ( 2 m × 32 ) ] 2^n \times [ 1 + (32 - n - m - 2) + (2^m\times 32) ] 2n×[1+(32nm2)+(2m×32)]

题目信息整理

  1. 1个字word = 4个字节byte = 32位bit(要除以每个字的字节数——4)
  2. Cache数据大小 32KiB
  3. 每个Cache块存有两个字(一个Cache存两个字)

先计算cache容量块数
32 K i b / 4 / 2 = 4096 = 2 12 32Kib/4/2=4096=2^{12} 32Kib/4/2=4096=212

索引位数为n=12

字偏移量占1位,字节偏移量占两位(RISC-V版270的图)

所以标签(Tag)位计算如下
32 − 12 − 1 − 2 = 17 32-12-1-2 = 17 321212=17
还有一位valid
17 + 1 = 18 17+1=18 17+1=18

所以需要
18 × 4096 = 73728 b i t s 18\times 4096 = 73728bits 18×4096=73728bits
也就是9216bytes

cache的大小为
9216 + 32768 = 41984 9216+32768 = 41984 9216+32768=41984

总cache大小计算如下
总大小 = 数据大小 + (有效位 b i t 大小 + 标签大小) × 块 总大小=数据大小+(有效位bit大小 + 标签大小)\times 块 总大小=数据大小+(有效位bit大小+标签大小)×
数据大小 = 块 × 块大小 × 字大小 数据大小=块\times 块大小\times 字大小 数据大小=×块大小×字大小

  1. 字大小 = 4
  2. 标签大小 = 32 − l o g 2 ( 块 ) − l o g 2 ( 块大小 ) − l o g 2 ( 字大小 ) 标签大小 =32 - log2(块)-log2(块大小)-log2(字大小) 标签大小=32log2()log2(块大小)log2(字大小)
  3. 有效位bit大小 = 1

将 每个块从2个字变成16个字会把标签大小从17变为14
所以得到以下不等式
41984 ≤ ( 64 + 15 ) × 块 41984\le (64+15)\times 块 41984(64+15)×

块 ≥ 531 块\ge 531 531,531的下一个为2的幂的数为1024

上面内容翻译自答案,但是我感觉有问题
在这里插入图片描述
15的单位是bits,64和41986的单位是bytes,为啥能直接加起来?应该15还得/8才对,也就是

41984 ≤ ( 64 + 15 / 8 ) × 块 41984\le (64+15/8)\times 块 41984(64+15/8)×

cache的块容量增大可能会需要更多的击中实践和失效惩罚时间,块数量减少可能会造成更高的失效率。所以第二种cache有可能比第一种cache访问速度更慢。

5.2.5 [20] <§§5.3, 5.4> Generate a series of read requests that have a lower miss rate on a 2 KiB 2-way set associative cache than the cache listed above. Identify one possible solution that would make the cache listed have an equal or lower miss rate than the 2 KiB cache. Discuss the advantages and disadvantages of such a solution.

相联cache是用来降低冲突未命中率的。所以 虽然同样具有12位tag字段,但不同tag字段的读取请求序列 会产生大量失效。

对于上述缓存,序列0,32768,0,32768……会在每次访问时丢失,而如果让两路组相联cache与LRU替换相关联,即使是总体容量较小的缓存,在前两次访问之后,也会在每次访问时命中

5.2.6 [15] <§5.3> Th e formula shown in Section 5.3 shows the typical method to index a direct-mapped cache, specifi cally (Block address) modulo (Number of blocks in the cache). Assuming a 32-bit address and 1024 blocks in the cache, consider a different indexing function, specifi cally (Block address[31:27] XOR Block address[26:22]). Is it possible to use this to index a direct-mapped cache? If so, explain why and discuss any changes that might need to be made to the cache. If it is not possible, explain why.

可以使用这个公式索引直接映射的cache。
可以使用此功能为cache编写索引。但是,由于这五个位是异或块地址的,因此有关这五个位的信息会丢失,因此必须包含更多的tag位来标识缓存中的地址

5.3 For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache

Tagindexoffset
31-109-54-0

5.3.1 [5] <§5.3> What is the cache block size (in words)?

偏移量是5位,表示5位字节,转化为字为3位
2 3 = 8 2^3=8 23=8
所以大小为8个字

5.3.2 [5] <§5.3> How many entries does the cache have?

tag占5位
2 5 = 32 2^5 = 32 25=32
块项为32

5.3.3 [5] <§5.3> What is the ratio between total bits required for such a cache implementation over the data storage bits?

2 n × ( 块大小,标记大小,有效域大小 ) 2^n\times(块大小,标记大小,有效域大小) 2n×(块大小,标记大小,有效域大小)

  1. 块大小为 2 5 × 32 b i t s 2^5\times 32 bits 25×32bits,也就是 2 3 2^3 23个字,m=3
  2. cache大小为5位 为 n = 5 n = 5 n=5
  3. 有效位假定为1

cache总位数

2 5 × ( 2 3 × 32 + ( 32 − 5 − 3 − 2 ) + 1 ) = 32 × ( 8 × 32 + 23 ) = 8928 2^5\times(2^3\times 32 + (32 - 5 - 3 - 2) + 1) = 32 \times(8\times 32 + 23) = 8928 25×(23×32+(32532)+1)=32×(8×32+23)=8928

数据储存位数位,一块是32字节,一共32块
32 × 32 × 8 32\times32\times 8 32×32×8

两者相比
8928 / ( 32 × 8 × 32 ) = 1.0898 8928/(32\times8\times32) = 1.0898 8928/(32×8×32)=1.0898

但是突然发现题目没给有效位信息,上面的计算应该是错的

( 32 ∗ 8 + 22 ) / ( 32 ∗ 8 ) (32*8+22)/(32*8) (328+22)/(328)
一个block本身有22位tag,加上尾巴的数据大小,再整体除以携带的数据

就是 ( t a g + 数据) / 数据 (tag+数据)/数据 (tag+数据)/数据

Starting from power on, the following byte-addressed cache references are recorded

0 4 16 132 232 160 1024 30 140 3100 180 218

5.3.4 [10] <§5.3> How many blocks are replaced?

最大数据为3100,二进制为 110000011100

n = 5 , m = 3 n = 5, m = 3 n=5,m=3
index为5位,m占三位

这里是真实地址,所以最后一个划分要比字地址划分多2,也就是3+2=5
可以分为 1100000 11100

构造下表

字地址二进制地址标签tag索引index命中或失效
000 00000 0000000Miss
400 00000 0010000Hit
1600 00000 1000000Hit
13200 00100 0010004Miss
23200 00111 0100007Miss
16000 00101 0000005Miss
102401 00000 0000010Miss
3000 00000 1111000Miss
14000 00100 0110004Hit
310011 00000 1110030Miss
18000 00101 1010005Miss
218010 00100 0010024Miss

先看index,然后如果tag不对就换掉
,所以有如下replace情况

  1. 1024(1,0)换掉(0,0)
  2. 30(0,0)换掉(1,0)
  3. 3100(3,0)换掉(0,0)
  4. 2180(2,4)换掉(0,4)
    一共4次

5.3.5 [10] <§5.3> What is the hit ratio?

3 / 12 = 25 % 3/12=25\% 3/12=25%

5.3.6 [20] <§5.3> List the fi nal state of the cache, with each valid entry represented as a record of <index, tag, data>.

indextagdata
03mem[3100]
42mem[2180]
50mem[180]
70mem[232]

答案给的后三题结果如下
在这里插入图片描述
但是我感觉很疑惑,为啥index是6位?tag是4位? 1024转化二进制不是只有一个1吗,为啥会出现两个1?所以我按照课本正文的理解写后续三道题目了。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值