6.824 2020春 论文阅读 FaRM

0. Prelude

This article is relevant to lecture 14. Although this paper take a lot of space to talk about fault recovery, I will ignore these which is unimportant for the lecture.

1. Overview

  • FaRMis distributed in-memory database , which consists of many machines in one datacenter(including clients) and leverage RDMA and DRAM with UPS technology .
  • Configuration file comprised of a tuple < i , S , F , C M > <i, S,F,CM> <i,S,F,CM> representing a unique configuration identifier, the set of machines comprising FaRM, a mapping from machines to failure domains and configuration manager respectively, is spread by zookeeper.
  • FaRM exposes a global address comprised 2 GB region to the application.
  • Configuration manager maintains a mapping from region identifier to primary and backup machines storing corresponding region(this mapping is not stored in zookeeper and will be cached in other machines) and is responsible for allocating new region.

2. Transaction process

An atomic distributed transaction which is driven by clients like Spanner can be split into following steps.

  1. Read all the objects needed to read from primary servers storing these objects and remember each version number.
  2. buffer all writes in local memory
  3. Send each primary server storing written objects a Lock log record using RDMA(actually, each machine need to reserve a RDMA queue pair of message and log respectively for each other machine).
  4. Primary servers will try to lock each written object and check whether version number of each written object has changed. If locking some object fails or version number has changed, the transaction must be aborted, otherwise the server will reply a LOCK-REPLY message to the client, which plays the coordinator role in two phase commit.
  5. Once receiving all LOCK-REPLY messages, client starts to validate all
  6. client sends a COMMIT-BACKUP log record to each backup server.
  7. As long as receiving all hardware ACK from backup server, client will send a COMMIT-PRIMARY message to all primary server. Once receiving an ACK, client API can return COMMIT SUCCESS to API caller(from recovery protocol, we can see system with replica number equal to F can guarantee that this interrupted transaction will be committed even though F replica machines storing the same object fail from the time of receiving one COMMIT-PRIMARY ACK).

3. Discussion about correctness

Actually, what I describe about transaction process above has a fatal error(I did not realize it when reading the paper until the teacher pointed it out in the lecture).

Before discussing where is the error, we first talk about serialization point I define as a timestamp at which all operations specified in a transaction are completed immediately.

Two phase lock scheme, easy to find we can consider the timestamp when we get all locks needed in the transaction as the serialization point.

We will see lock scheme of this paper is essentially identical with two phase lock, so defines the same serialization point. Consider following example.

T1 begin 
read x
if x = 0
  set y = 1
T1 end

T2 begin
read y
if y = 0
  set x = 1
T2 end

And following execution order.

--------> time
T1: read x  lock y validate x                          commit
T2:                           read y lock x validate y commit

If in validation stage we only check version number, T1 and T2 will both commit successfully, which is not a serializable result !

So we do need a adjustment in the validation stage. Now, validating object y not only check version number, but also check whether lock of object y is being acquired by others.

this adjustment essentially guarantee that anyone can’t read y once we have locked y.

If we think reading object y acquire read lock at the same time, and release read lock when validating y successfully, optimistic concurrency control of the paper is reduced to two phase lock scheme!

4. Limitation

  • Application code can see inconsistencies while executing transactions that will eventually abort. For example, if the transaction reads a big object at the same time that a committing transaction is overwriting the object.
  • scalability?
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值