【计算机组成与设计】-第五章 memory hierarchy（二）

甲六乙

已于 2022-09-01 12:01:20 修改

阅读量600

点赞数

分类专栏：计算机组成与设计第五版 ARM版文章标签：计算机组成与设计 cache基础

于 2022-09-01 11:19:09 首次发布

本文链接：https://blog.csdn.net/m0_38037810/article/details/126639393

版权

计算机组成与设计第五版 ARM版专栏收录该内容

3 篇文章 0 订阅

订阅专栏

5.3 the basics of cache

Directed mapping cache

通常cache mapping方式有三种，directed、set association、full mapping。这里介绍directedmapping。

Directed mapping就是每个memory location只能存在cache中的固定位置。

在cache中的位置是根据地址来计算的，如下

因为cache中的存储粒度是block（也就是cache line），所以上面用的是block address。地址的低位决定存在cache中哪个cache line中。

多个location可以映射到cache中的同一个位置，所以需要比较哪个memory location存在了cache 中，这个用来比较的东西是tag，tag一般是地址中的高位。

Valid bit

每个cache line都有一个valid bit，用来表明该cache line是否有效

The hit rates of the cache prediction on modern computers are often above 95%。

每个cache line中存储：

Data（block）
Tag
Valid bit

如下，是directed cache 地址mapping过程。地址分成三部分：

Tag A tag field, which is used to compare with the value of the tag field of the

cache

Index。 A cache index, which is used to select the block
Offset

上图中的cache：

■ 64-bit addresses

■ A direct-mapped cache

■ The cache size is 2^n blocks, so n bits are used for the index

■ The block size is 2^m words (2^(m+2) bytes), so m bits are used for the word within

the block, and two bits are used for the byte part of the address

The size of the tag field is

64 - (n+m+2) .

The total number of bits in a direct-mapped cache is

Hit rate and miss rate

hit rate The fraction of memory accesses found in a level of the memory hierarchy.

miss rate The fraction of memory accesses not found in a level of the memory hierarchy.

Miss penalty

miss penalty The time required to fetch a block into a level of the memory hierarchy from the lower level, including the time to access the block, transmit it from one level to the other, insert it in the level that experienced

the miss, and then pass the block to the requestor.

Hit time

hit time The time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.

Relationship of hit rate, penalty and block size

Cache line block size越大，hit rate越大，但是发生miss的时候，penalty越大；因为需要更多时间从lower memory hierarchy搬运数据到higher hierarchy。

降低penalty的技术

Early restart

resume execution as soon as the requested word of the block is returned, rather than wait for the entire block

Requested word first or critical word first

the requested word is transferred from the memory to the cache first. The remainder

of the block is then transferred, starting with the address after the requested word and wrapping around to the beginning of the block.

Cache miss

发生cache miss时候，对于in-order processor，它会pipeline stall，等待cache miss被处理，也就是从memory中搬运对应的block到cache中。

对于out-order processor，可以继续执行指令。

发生instruction cache miss的处理过程如下，data cache miss处理与此类似：

1. Send the original PC value to the memory.

2. Instruct main memory to perform a read and wait for the memory to

complete its access.

3. Write the cache entry, putting the data from memory in the data portion of

the entry, writing the upper bits of the address (from the ALU) into the tag

field, and turning the valid bit on.

4. Restart the instruction execution at the first step, which will refetch the

instruction, this time finding it in the cache

Write through and write back

Write through 和write back是两种常用的cache写回策略。

Write through

Write through就是每次CPU改写cache中的某个word，同时会将这个word写回到memory，保证cache和memory 是consistent，一致的。

只将被改写的word写回到memory中，而不是整个cache line。

Write through策略中，每个store、write操作都会产生memory write access，比较慢，降低性能。

Write buffer

Write buffer用来解决write through策略中，每次store都要等待memory access done的问题。CPU将数据写入到cache和write buffer中，CPU就可以继续执行程序。Write buffer中的数据被写入到memory中后，entry in write buffer被释放；如果write buffer满了，那么CPU要等待write buffer为空，将数据写入到write buffer，才可以继续执行程序。

Write buffer满的情况有两种：

cpu的memory store rate大于数据从write buffer写到memory的速度，那么write buffer总是会满，write buffer也就不起作用了。
在一个长的write burst中，write buffer满了。这种情况可以通过增大buffer depth，使得depth大于一个cache line entry。