Cache Cohernce with Multi-Processor

Cache Cohernce with Multi-Processor

作者:BNN
转载自:linuxforum的CPU与编译器板块精华版

刚写完一篇关于Cache Coherence的文章,就发现BNN2年前就有一篇好文,早知道就不这么费事自己写了:)

Recently work with dual cpu kernel part. For dual cpu, or we say, multi-processor, the big challenge part for a kernel is how to handle the cache coherence.

Conceptually, two choices--Write Invalidate and Write Update.

We will talk about Write Invalidate today.

Typically, there are two protocols falling into the Write Invalidate protocol, namely, the Write-Through Write Invalidate Protocol and the Write Once(Or Write-Back Write Invalidate Protocol). Note that the well-known MESI protocol is derived from the Write Once. That's why we will focus on the Write Once here.
--------------------------
Write Once:

Write Once protocol is to offset the shortcomings of Write-Through Write-Invalidate Protocol, which will introduce extra bus traffic onto the system bus.

Write Once works basically as follows:

(Assume
* the hardware snoop enabled over the shared system bus.
* The cache is Write Back
)
There are four states for Write Once protocol--Valid, Reserved, Dirty and Invalid.

Initial State:

* When a LOAD MISS, the data will be loaded into cache line and the state goes to VALID state

// Please note here, Write Once protocol will promise that your loaded data in memory will be the latest. Why? If at the time you tried to load a cache line, there is an modified copy in another CPU, the snoop protocol will abort the load bus transaction; flush another cpu's data into main memory and then resume the aborted transaction so that the requesting CPU will then get the updated data....

Now, let's investigate the state machine of Write Once Protocol.

***************
VALID State:
***************

When a LOAD HIT, we do nothing. That's right. The cache line is already here. CPU is happy to find the data in the cache.

When a LOAD MISS, we will re-start the init procesure to load the latest data into cache line.

When a CPU STORE HIT(a store hit from the current processor) , Now comes to the key part of Write Once protocol. When having a write/store behavior, for UP(unique processor) system, we all understand that the cache state will go to DIRTY state and ****didn't write data back to main memory****. However, Write-Once protocol works like this below in order to achieve multiple processor cache coherence.

The stored data will be flushed back to main memory(why? We need a bus transaction over the bus!!!) and then cache state will be moved to Reserved State.

This is exactly why this protocol is given the name of "Write Once"---Write the first time write access to a write-back cache line into main memory*****!!!! so that other processor cache controller will be awared and then invalidate the corresponding cache lines, and thus the whole system will only one copy of the cache line.

After the first time write once, the subsequent write access will only change the state to DIRTY state and the data will stay in cache line and will not be flushed into main memory, the same case as we see in UP write-back approach.

When a SNOOP STOEE HIT( we found another CPU is trying to do a store on that cached address), then, with the write-invalidate semantics, we know, the system will then invalidate its own copy in this current processor in order to keep only one legal copy for that cache line. In other words, the state will go to Invalid state from Valid state. Note that, we don't have to do any flush. The reason is simple: In this processor, we didn't do any write yet. So we will only invalidate our own copy in this processor. In the later, if we want to read this particular data, we will have to load it from main memory again.

For VALID state, we have other input needed to be considered, like

snoop load hit
cpu store miss

How cpu will react with these two actions?
I will leave questions to you guys......


Recently work with dual cpu kernel part. For dual cpu, or we say, multi-processor, the big challenge part for a kernel is how to handle the cache coherence.

Conceptually, two choices--Write Invalidate and Write Update.

We will talk about Write Invalidate today.

Typically, there are two protocols falling into the Write Invalidate protocol, namely, the Write-Through Write Invalidate Protocol and the Write Once(Or Write-Back Write Invalidate Protocol). Note that the well-known MESI protocol is derived from the Write Once. That's why we will focus on the Write Once here.
--------------------------
Write Once:

Write Once protocol is to offset the shortcomings of Write-Through Write-Invalidate Protocol, which will introduce extra bus traffic onto the system bus.

Write Once works basically as follows:

(Assume
* the hardware snoop enabled over the shared system bus.
* The cache is Write Back
)
There are four states for Write Once protocol--Valid, Reserved, Dirty and Invalid.

Initial State:

* When a LOAD MISS, the data will be loaded into cache line and the state goes to VALID state

// Please note here, Write Once protocol will promise that your loaded data in memory will be the latest. Why? If at the time you tried to load a cache line, there is an modified copy in another CPU, the snoop protocol will abort the load bus transaction; flush another cpu's data into main memory and then resume the aborted transaction so that the requesting CPU will then get the updated data....

Now, let's investigate the state machine of Write Once Protocol.

***************
VALID State:
***************

When a LOAD HIT, we do nothing. That's right. The cache line is already here. CPU is happy to find the data in the cache.

When a LOAD MISS, we will re-start the init procesure to load the latest data into cache line.

When a CPU STORE HIT(a store hit from the current processor) , Now comes to the key part of Write Once protocol. When having a write/store behavior, for UP(unique processor) system, we all understand that the cache state will go to DIRTY state and ****didn't write data back to main memory****. However, Write-Once protocol works like this below in order to achieve multiple processor cache coherence.

The stored data will be flushed back to main memory(why? We need a bus transaction over the bus!!!) and then cache state will be moved to Reserved State.

This is exactly why this protocol is given the name of "Write Once"---Write the first time write access to a write-back cache line into main memory*****!!!! so that other processor cache controller will be awared and then invalidate the corresponding cache lines, and thus the whole system will only one copy of the cache line.

After the first time write once, the subsequent write access will only change the state to DIRTY state and the data will stay in cache line and will not be flushed into main memory, the same case as we see in UP write-back approach.

When a SNOOP STOEE HIT( we found another CPU is trying to do a store on that cached address), then, with the write-invalidate semantics, we know, the system will then invalidate its own copy in this current processor in order to keep only one legal copy for that cache line. In other words, the state will go to Invalid state from Valid state. Note that, we don't have to do any flush. The reason is simple: In this processor, we didn't do any write yet. So we will only invalidate our own copy in this processor. In the later, if we want to read this particular data, we will have to load it from main memory again.

For VALID state, we have other input needed to be considered, like

snoop load hit
cpu store miss

How cpu will react with these two actions?
I will leave questions to you guys......


----------------
Valid State
----------------

SNOOP LOAD HIT:
That means that current CPU cache controller ***see** a bus transaction issued by another device, for instance, another CPU, to access the data that is currentily cached. When this happens, current CPU cache will do nothing but stay the VALID state. The reason is simple: Another device/cpu will fetch the data from the memory that STILL holds the latest data.

CPU STORE MISS:

How/when this situation could happen, after a cache line WAS loaded before? Yeah, you are right. That cache line could be **replaced** by other datas(Remember the LRU algorithm of cache management; Set associative concepts/mechanisms and so such).

when a CPU store miss happens, the data will be written into main memory and also have a copy in the cache (That's the "Write Allocate" when handling a write miss). After that, the cache snoop protocol will move its state into Reserved from the VALID. Why? That means that a write and more importantly, the first write, has just completed!

--------------
Reserved
--------------

As I said before, the Reserved state was introduced to reflect the write once semantics--The first write will flush data into memory so that other devices/CPUs are able to see/be notified a bus transaction!

For this state, same as VALID state, its input could vary as follows:

* CPU LOAD HIT
Simply feed the data back to CPU and keep the state stay.

* CPU WRITE HIT
Since we are using WT(write back) for the cache lines, then we will change our state to DIRTY. In other words, we will not flush data into memory, in contrast with the WT(Write Through) approach.

* CPU Load Miss

System will simply load the data from main memory and keep a copy in the cache(if it is Not-Read Through). And reset the system as VALID STATE.

* CPU WRITE MISS

This cache line got replaced some time after it was loaded into cache. System will simply write the data back to main memory and then keep the latest copy in the cache and set the state still as Reserved.

* SNOOP LOAD HIT

We **see** another device is to read the cached data. Here we don't have to do any cache flush/invalidating behavior. What need to do is to change the cache coherent protocol state to VALID. Question: what would happen if we stay Reserved state? For example, if there is a coming CPU store hit?:-)

* SNOOP WRITE HIT

We **see** a write access issued by another device/CPU. We will invalidate our own cache, which will then become stale and move our state to INVALID. The reason why we don't have to involve in any flushing is simple: Our local cache data value is the same as the one in main memory. Thus, the only thing we need to do is to invalidate our private copy.

多核缓存层次结构是指在多核处理器中的缓存分层结构。现代计算机系统中,多核处理器已经成为主流的架构,它们具有多个核心来执行并行任务。为了提高多核处理器的性能,缓存层次结构被引入,它有助于减少内存访问的延迟并提高数据的局部性。 多核缓存层次结构通常由多级缓存组成,每一级缓存有不同的大小、延迟和访问频率。最靠近处理核心的是一级缓存或L1缓存,它通常是分为指令缓存和数据缓存。其次是二级缓存或L2缓存,它的容量更大但访问延迟也更高。还可能存在更高级别的缓存,如L3缓存或LLC(最后级缓存),它的容量更大但访问延迟更高。 多核缓存层次结构的主要目标是提供更快的数据访问和减少内存带宽压力。当一个核心访问内存时,它首先检查最近的缓存层,如果数据在缓存中,则称为缓存命中(cache hit),可以直接从缓存中读取数据,而不需要访问内存。如果数据不在缓存中,则称为缓存未命中(cache miss),需要从内存中读取数据并将其存储到缓存中。每次缓存未命中将会增加访问延迟。 多核缓存层次结构还可以提供更好的数据局部性。当一个核心访问数据时,它通常会访问附近的数据,这被称为时间局部性。如果附近的数据也被其他核心访问,它们可以从共享缓存中读取数据,而不需要访问内存。这可以减少内存带宽压力,并提高整个系统的性能。 综上所述,多核缓存层次结构是一种在多核处理器中广泛采用的技术,它通过提供更快的数据访问和减少内存带宽压力的方式来提高系统的性能。它通过多级缓存和数据局部性提高了系统的效率,并在现代计算机系统中扮演着重要的角色。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值