5.3 the basics of cache
Directed mapping cache
通常cache mapping方式有三种,directed、set association、full mapping。这里介绍directedmapping。
Directed mapping就是每个memory location只能存在cache中的固定位置。
在cache中的位置是根据地址来计算的,如下
因为cache中的存储粒度是block(也就是cache line),所以上面用的是block address。地址的低位决定存在cache中哪个cache line中。
多个location可以映射到cache中的同一个位置,所以需要比较哪个memory location存在了cache 中,这个用来比较的东西是tag,tag一般是地址中的高位。
Valid bit
每个cache line都有一个valid bit,用来表明该cache line是否有效
The hit rates of the cache prediction on modern computers are often above 95%。
每个cache line中存储:
- Data(block)
- Tag
- Valid bit
如下,是directed cache 地址mapping过程。地址分成三部分:
- Tag A tag field, which is used to compare with the value of the tag field of the
cache
- Index。 A cache index, which is used to select the block
- Offset
上图中的cache:
■ 64-bit addresses
■ A direct-mapped cache
■ The cache size is 2^n blocks, so n bits are used for the index
■ The block size is 2^m words (2^(m+2) bytes), so m bits are used for the word within
the block, and two bits are used for the byte part of the address
The size of the tag field is
64 - (n+m+2) .
The total number of bits in a direct-mapped cache is
Hit rate and miss rate
hit rate The fraction of memory accesses found in a level of the memory hierarchy.
miss rate The fraction of memory accesses not found in a level of the memory hierarchy.
Miss penalty
miss penalty The time required to fetch a block into a level of the memory hierarchy from the lower level, including the time to access the block, transmit it from one level to the other, insert it in the level that experienced
the miss, and then pass the block to the requestor.
Hit time
hit time The time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.
Relationship of hit rate, penalty and block size
Cache line block size越大,hit rate越大,但是发生miss的时候,penalty越大;因为需要更多时间从lower memory hierarchy搬运数据到higher hierarchy。
降低penalty的技术
Early restart
resume execution as soon as the requested word of the block is returned, rather than wait for the entire block
Requested word first or critical word first
the requested word is transferred from the memory to the cache first. The remainder
of the block is then transferred, starting with the address after the requested word and wrapping around to the beginning of the block.
Cache miss
发生cache miss时候,对于in-order processor,它会pipeline stall,等待cache miss被处理,也就是从memory中搬运对应的block到cache中。
对于out-order processor,可以继续执行指令。
发生instruction cache miss的处理过程如下,data cache miss处理与此类似:
1. Send the original PC value to the memory.
2. Instruct main memory to perform a read and wait for the memory to
complete its access.
3. Write the cache entry, putting the data from memory in the data portion of
the entry, writing the upper bits of the address (from the ALU) into the tag
field, and turning the valid bit on.
4. Restart the instruction execution at the first step, which will refetch the
instruction, this time finding it in the cache
Write through and write back
Write through 和write back是两种常用的cache写回策略。
Write through
Write through就是每次CPU改写cache中的某个word,同时会将这个word写回到memory,保证cache和memory 是consistent,一致的。
只将被改写的word写回到memory中,而不是整个cache line。
Write through策略中,每个store、write操作都会产生memory write access,比较慢,降低性能。
Write buffer
Write buffer用来解决write through策略中,每次store都要等待memory access done的问题。CPU将数据写入到cache和write buffer中,CPU就可以继续执行程序。Write buffer中的数据被写入到memory中后,entry in write buffer被释放;如果write buffer满了,那么CPU要等待write buffer为空,将数据写入到write buffer,才可以继续执行程序。
Write buffer满的情况有两种:
- cpu的memory store rate大于数据从write buffer写到memory的速度,那么write buffer总是会满,write buffer也就不起作用了。
- 在一个长的write burst中,write buffer满了。这种情况可以通过增大buffer depth,使得depth大于一个cache line entry。
Write back
当modified cache line要被其他block替换掉的时候,才将modified cache line写回memory中。
在实现上,write back 比write through更困难,特别是在多核处理器,要保证多个core看到的memory是一样的。
Write allocation and write non-allocation
Write allocation:
Write的时候发生cache miss,先从memory中读block,然后将block写入到cache中。如果是write through策略的话,还要将写入的数据写回到memory。
Write non-allocation:
Write时候发生cache miss,直接将数据写入到memory中。
Replace cache line
对于write through cache, 直接替换就行了,因为cache和memory block是一样的。
对于write back cache ,要判断cache line是否是dirty的,如果是,需要将cache line写回memory后,再替换掉cache line。
Write back 也可以利用write buffer,把要替换的cache line移动到write buffer(一个cache line大小),然后从memory读数据,写到cache 中。
Cache Example
如下cache,cache line size是16个word,也就是64个byte。Cache size是16KB。
所以,offset是6 bits,低2bit是word对齐,所以忽略,bit5-bit2是索引哪个word
Index:用来索引那个cache line,2KB/64byte = 2^8,所以8bit为index;
Tag:最高18bit作为tag,用来比较。