CPU Cache技术学习笔记

1,cache机制简述
1.1 what is  direct mapped  / fully associative cache / N-way set associative?

The cache is subdivided into subsets of lines. 
cache line 指在慢速的off-chip dram和快速的on-chip cpu cache间数据传输的最小单位,
一般使用burst mode传输.

1),At one extreme, the cache can be direct mapped , in which case a line 
in main memory is always stored at the exact same location in the cache. 

2),At the other extreme, the cache is fully associative , meaning that 
any line in memory can be stored at any location in the cache.

3),most caches are to some degree N-way set associative , where any line 
of main memory can be stored in any one of N lines of the cache. For 
instance, a line of memory can be stored in two different lines of a 
two-way set associative cache. 
(整个cache包含多个set,每个set又包含N个cache line,即所谓的N-way)


direct mapped cache容易引起cached line的replacement,因为一个memory line只
可以存储在cache 中的一个位置; fully associative理论上最优,但实现难度大,
需要对每个cache line实现一个比较器.

实际的cache实现多使用N-way set associative,只需要对index对应的set中的N个
cache line实现并行的比较器. 

1.2 如何映射memory address 到cache? 
1.2.1 基本映射机制
   memory address被分为以下几个部分:tag + index + offset_in_line.
index用于找到对应cache中的哪个set,一般使用取模计算:
   set_no = index MOD (number of sets in the cache)

   找出memory address对应哪个set后,然后用tag field和set中的每个way 对应的
tag 比较(这个比较使用硬件电路并行实现);如果找到匹配的项,说明cache hit,找到
memory address对应的cache line,否则表明cache miss.

   如果cache hit,根据offset_in_line field即可找到在cache line中对应的数据;
否则,需要从dram中读取memory address对应数据到cache中.   


1.2.2 实现中需要考虑的几个问题
1)physical address vs. virtual address
  memory address使用physical address还是virtual address去访问cache呢?
  
  virtual address --- not unique
      .多个进程有同样的地址空间
      .we’ll need to include a field identifying the address space in
the cache tag to make sure we don’t mix them up.
      .The same physical location may be described by different addresses in different
tasks. In turn, that might lead to the same memory location
cached in two different cache entries  (cache aliases)
 
  physical address---
      .A cache that works purely on physical addresses is easier to manage
(we’ll explain why below), but raw program (virtual) addresses are available
to start the cache lookup earlier, letting the system run that little
bit faster. 
   (physical address 只有在通过mmu对virtual address转化后才可以得到,会更慢一些)
      
2),Choice of line size:
  When a cache miss occurs, the whole line must be filled from memory.
  line size越大,产生数据读取和写入延时越大.


3),Split/unified:
   I cache / D cache问题.
   the selection is done purely by function, in that instruction
fetches look in the I-cache and data loads/stores in the D-cache. (This
means, by the way, that if you try to execute code which the CPU just
copied into memory you must both flush those instructions out of the
D-cache and ensure they get loaded into the I-cache.)

1.3 多级cache技术
   许多cpu已经使用L1/L2/...cache.
   使用多级cache技术的主要目的是 降低cache miss 所引起的 penalty.
 

2,程序设计中对cache问题的考虑

2.1 dma 操作
2.1.1 Before DMA out of memory 
   If a device is taking data out of memory, it’s
vital that it gets the right data. If the data cache is write back and a
program has recently written some data, some of the correct data may
still be held in the D-cache but not yet be written back to main memory.
The CPU can’t see this problem, of course; if it looks at the memory
locations it will get the correct data back from its cache.
So before the DMA device starts reading data from memory, any data
for that range of locations that is currently held in the D-cache must be
written back to memory if necessary.

2.1.2 DMA into memory
   If a device is loading data into memory, it’s important
to invalidate any cache entries purporting to hold copies of the memory
locations concerned; otherwise, the CPU reading these localions will obtain
stale cached data. The cache entries should be invalidated before
the CPU uses any data from the DMA input stream.


2.2 Writing instructions
   When the CPU itself is storing instructions into
memory for subsequent execution, you must first ensure 
that the instructions are written back to memory and 
then make sure that the corresponding I-cache locations 
are invalidated: The MIPS CPU has no connection between 
the D-cache and the I-cache.

2.3 linux slab allocator
  linux slab cache包含多个slab, 所包含的多个slab用于分配和释放
同一类型的object(一般这些object使用同一个数据类型来定义).
  linux对每个slab分配1到多个连续的物理page frame.
  属于同一slab cache的多个slab中包含的位于同一offset的object,对应到
同一个cache line的概率非常大,至少说对应到cpu cache中同一个set的概率
非常大.
  linux slab allocator使用了所谓的colour offset技术来避免这一问题.
尽可能对属于同一个slab cache的每个slab指定一个不同的colour offset,
这个colour offset决定了slab中第一个object的存储位置. 通过这种方法,
大大降低了上述的问题.
  
2.4 other
  linux中许多数据结构定义都有类似的注释:
 /*
 * Keep related fields in common cachelines.  The most commonly accessed
 * field (b_state) goes at the start so the compiler does not generate
 * indexed addressing for it.
 */
struct buffer_head {
    /* First cache line: */
    unsigned long b_state;        /* buffer state bitmap (see above) */
    struct buffer_head *b_this_page;/* circular list of page's buffers */
    struct page *b_page;        /* the page this bh is mapped to */
    atomic_t b_count;        /* users using this block */
    u32 b_size;            /* block size */

    sector_t b_blocknr;        /* block number */
    char *b_data;            /* pointer to data block */

    struct block_device *b_bdev;
    bh_end_io_t *b_end_io;        /* I/O completion */
     void *b_private;        /* reserved for b_end_io */
    struct list_head b_assoc_buffers; /* associated with another mapping */
};
有利于高效访问相关的数据域.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
CPU缓存代理是一种位于CPU内部的硬件组件,用于优化计算机处理数据的性能。 在计算机体系结构中,CPU缓存代理是位于CPU和内存之间的缓存层次结构中的一个组件。它的作用是在CPU执行指令时提供快速访问数据。CPU缓存代理通过将最常用的数据存储在靠近CPU的高速缓存内,可以减少内存访问的延迟时间。 CPU缓存代理主要包括三级缓存:L1缓存、L2缓存和L3缓存。L1缓存位于CPU内部,与CPU核心紧密连接,速度最快,容量最小。L2缓存位于L1缓存之后,速度稍慢,容量较大。L3缓存位于CPU芯片上,容量最大,速度相对较慢。 CPU缓存代理的工作原理是通过缓存替换算法和缓存一致性协议来管理缓存中的数据。缓存替换算法决定了何时将新数据加载到缓存中,以及何时将旧数据替换出去。缓存一致性协议则保证了多核CPU中各个核心之间访问共享内存的一致性。 CPU缓存代理的存在可以显著提升计算机的性能。由于缓存可以更快速地访问数据,CPU不需要每次都从内存中读取数据,而是直接从缓存中读取,从而减少了内存的访问延迟。此外,CPU缓存代理还可以减少对内存总线的压力,提高数据传输效率。 总之,CPU缓存代理是一个重要的硬件组件,可以提升计算机处理数据的性能。它通过在CPU和内存之间提供高速缓存来减少内存访问的延迟,并通过缓存替换算法和缓存一致性协议来管理缓存中的数据。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值