6.824 2020春 论文阅读 spanner

0. prelude

写了两篇之后我发现自己的懒癌又犯了,尝试接下来的博客用英语来记录的课程内容,算是对一种对继续写的激励吧…orz

1. Overview design

Spanner is a semi-relational and globally distributed database with strong consistency and support of distributed transaction.
spanner overview
Figure 1 is a deployment instance of Spanner. Location proxy help clients locate their data in corresponding zone. This paper describes transaction process in the spanserver in detail.

Each spanserver is responsible for between 100 and 1000 instances of tablet like a mapping {key:string, timestamp:int64}->string. Each tablet is replicated by Paxos protocol.

So here is an overall image. A great deal of spanservers, each with many tablets stored in it. For single tablet, all spanservers storing it form a Paxos group with one server being selected as group leader.

2. TrueTime API

In each datacenter, there are a set of time master machines and time slave daemon per machine. Each time master machine will be equipped with either GPS receivers with dedicated antennas or an atomic clock. Every time slave daemon will select a part of time master machines to correct its own local clock based on Marzullo's algorithm periodically.

Three functions in TrueTime API.

  • TT.now() returns an interval [ a , b ] [a, b] [a,b] which guarantees inclusion of current real time.
  • TT.after(t) returns true if time t has definitely passed
  • TT.before(t) returns true if time t has definitely not arrived.

3. transaction process

design Goal: support distributed transaction and lock free read-only transaction with strong consistency.

  • Read-write transaction is a standard two phase commit driven by clients.
  1. Client will first read objects from leader of Paxos group and acquire locks simultaneously, then buffer all writes in local memory.
  2. After that, it choose the leader of some Paxos group as transaction coordinator and send all writes into corresponding Paxos groups.
  3. Leader of Paxos group need to send prepare message including a prepare timestamp to coordinator when it locks all written objects successfully.
  4. After receiving all prepare message, coordinator will choose a appropriate commit timestamp(I will ignore the details about how to choose the timestamp by using TrueTime API to guarantee strong consistency) , log a commit record through Paxos, undergo a commit wait stage, then send commit message to other Paxos groups and the client.
  • Lock-free read-only transaction is implemented by multi-version features.
  1. Each read-only transaction will also be given a read timestamp, then each spanserver will return requested data which has the biggest timestamp less than read timestamp.
  2. A tricky thing is the transaction may need to read data cross multiple Paxos groups, which need to determine a consistent read timestamp. Spanner’s solution is to set read timestamp TT.now().latest by client.
  3. Any replica server can deal with read request once its safe time (guarantee future write timestamp will be greater than this one) is greater than read timestamp.

4. takeaway

Implement lock-free read transaction by multi-version or snapshot isolation trick.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值