花点时间记录一下自己对cache的一些基本概念的理解。cache的功能这里略去。
一、cache consistency
cache consistency这个概念到处都在用。但是各有各的解释,容易模糊。在这里,借助参考文章【1】中的定义简单化这个概念:
eventually the value of key k should be the same as the underlying data store, if k exits in cache.
即只要cache中的值和back-end中的值一致,就是consistent 。这里back-end我们认为是DB即可。
二、Look-aside cache vs Inline cache
there are two main ways people use a distributed cache,
- cache-aside: this is where application is responsible for reading and writing from the database and the cache doesn’t interact with the database at all. the cache is “kept aside” as a faster and more scalable in-memory data store. the application checks the cache before reading anything from the database. and, the application updates the cache after making any updates to the database. this way, the application ensures that the cache is kept synchronized with the database.
- read-through/write-through (rt/wt): this is where the application treats cache as the main data store and reads data from it and writes data to it. the cache is responsible for reading and writing this data to the database, thereby relieving the application of this responsibility.
借助文献【2】的图更为清晰:

三、Read-Through, Write-Through, Write-Behind
本章节的概念主要是指Inline cache中的概念。
Write-Behind就是Write-Back,指的是同一个事情。
3.1 Read-Through
当应用系统向缓存系统请求数据时(例如使用key=x向缓存请求数据);如果缓存中并没有对应的数据存在(key=x的value不存在),缓存系统将向底层数据源的读取数据。如果数据在缓存中存在(命中key=x),则直接返回缓存中存在的数据。这就是所谓的Read-through。参考文献【3】中是Oracle的图:

Write-through cache means for mutation, client writes to cache directly. And cache is responsible of synchronously write to the data store. It doesn’t say anything about reads. Clients can do look-aside reads or read-through
3.2 Write-Through
当应用系统对缓存中的数据进行更新时(例如调用put方法更新或添加条目),缓存系统会同步更新缓存数据和底层数据源。
下图展示了执行过程:

3.3 Write-Behind
当应用系统对缓存中的数据进行更新时(例如调用put方法更新或添加条目),缓存系统会在指定的时间后向底层数据源更新数据

3.4 Write-Behind vs Write-Through caching
The benefit of write-through to main memory is that it simplifies the design of the computer system. With write-through, the main memory always has an up-to-date copy of the line. So when a read is done, main memory can always reply with the requested data.
If write-back is used, sometimes the up-to-date data is in a processor cache, and sometimes it is in main memory. If the data is in a processor cache, then that processor must stop main memory from replying to the read request, because the main memory might have a stale copy of the data. This is more complicated than write-through.
Also, write-through can simplify the cache coherency protocol because it doesn’t need the Modify state. The Modify state records that the cache must write back the cache line before it invalidates or evicts the line. In write-through a cache line can always be invalidated without writing back since memory already has an up-to-date copy of the line.
One more thing - on a write-back architecture software that writes to memory-mapped I/O registers must take extra steps to make sure that writes are immediately sent out of the cache. Otherwise writes are not visible outside the core until the line is read by another processor or the line is evicted.
四、existing caching system
我们在这里先补充一个概念Demand-fill 。Demand-fill means in the case of MISS, client will not only uses the value from data store, but also puts that value into cache.
4.1 memcache
memcache 的 cache policy 可以用 2 个词概括:
- demand-filled look-aside (read)
- write-invalidate (write)
demand-filled look-aside 指读数据时,web server 先尝试从 memcache 中读数据,若读取失败则从持久化存储中获取数据填充到 memcache 中;写数据时,先更新数据库,然后将 memcache 中相应的数据删除。
如下图所示:

注:
我们的论文的cache policy其实类似于memcache。
读的时候先从cache读,如果没有就从DB读,然后也存到cache中。
写的时候就是写到DB中,然后执行不同的cache invalidation,而不仅仅是delete。
lease
使用 memcache 可以减少请求直接访问 DB 的次数,但出现 cache miss 时,DB 依然会承受负载压力,一条热点数据可能造成瞬间高压。
FB 在 memcache 中通过引入 leases 来解决两个问题:
- stale set:过期写入
- thundering herds:瞬间高压
Stale Set
look-aside cache policy 下可能发生数据不一致:
假设两个 web server, x 和 y,需要读取同一条数据 d,其执行顺序如下:
- x 从 memcache 中读取数据 d,发生 cache miss,从数据库读出 d = A
- 另一个 memcache client 将 DB 中的 d 更新为 B
- y 从 memcache 中读取数据 d,发生 cache miss,从数据库读出 d = B
- y 将 d = B 写入 memcache 中
- x 将 d = A 写入 memcache 中
此时,在 d 过期或者被删除之前,数据库与缓存内的数据将保持不一致的状态。引入 leases 可以解决这个问题:
- 每次出现 cache miss 时返回一个 lease id,每个 lease id 都只针对单条数据
- 当数据被删除 (write-invalidate) 时,之前发出的 lease id 失效
- 写入数据时,sdk 会将上次收到的 lease id 带上,memcached server 如果发现 lease id 失效,则拒绝执行
或者这么理解:
- client gets a MISS with lease
L0 - client reads DB get value
A - someone updates the DB to value
Band invalidates the cache entry, which sets lease toL1 - client puts value
Ainto cache and fails due to lease mismatch
thundering herds
当数据出现访问热点时,可能导致成千上万个请求同时发生 cache miss,从而重击 DB。通过扩展 lease 机制可以解决这个问题。每个 memcached server 都会控制每个 key 的 lease 发放速率。默认配置下,每个 key 在 10 秒内只会发放一个 lease,余下访问同一个 key 的请求都会被告知要么等待一小段时间后重试或者拿过期数据走人。通常在数毫秒内,获得 lease 的 web server 就会将数据填上,这时其它 client 重试时就会成功,整个过程只有一个请求会穿透到 DB。
4.2 TAO
TAO (TAO: Facebook’s Distributed Data Store for the Social Graph) for example is a read-through & write-through cache.
PS
参考文献【4】【6】写的其实很直接和简洁,可以参考!
参考文献
【1】Different ways of caching and maintaining cache consistency
【2】Notes: Scaling Memcached at Facebook
【3】Coherence Getting Started Guide
【4】美团二面:Redis与MySQL双写一致性如何保证
【5】Scaling Memcache at Facebook (2013)
【6】(讨论)缓存同步、如何保证缓存一致性、缓存误用

262

被折叠的 条评论
为什么被折叠?



