Spanner
Spanner的架构设计如下图所示。
采取这种设计原则的好处:
- Sharding allows huge total throughput via parallelism
- Datacenters fail independently – different cities.
- Clients can read local replica – fast!
- Can place replicas near relevant customers.
- Paxos requires only a majority – tolerate slow/distant replicas.
- Coordinator leader由任意的Paxos group leader,解决了2PC中coordinator leader crash的问题。
需要解决的问题:
- Read of local replica must yield fresh data.
- But local replica may not reflect latest Paxos writes!
- A transaction may involve multiple shards -> multiple Paxos groups.
- Transactions that read multiple records must be serializable.
- But local shards may reflect different subsets of committed transactions!
Read-Write Transaction
假设需要执行以下银行事务:
BEGIN
x = x + 1
y = y - 1
END
Spanner提交该事务的具体流程如下所示:
- Client picks a unique transaction id (TID).
- Client sends each read to Paxos leader of relevant shard
- Each shard first acquires a lock on the relevant record
- Separate lock table per shard, in shard leader.
- Read locks are not replicated via Paxos, so leader failure -> abort.
- Client keeps writes private until commit.
- When client commits:
- Chooses a Paxos group to act as 2pc Transaction Coordinator (TC).
- Sends writes to relevant shard leaders.
- Each written shard leader:
- Acquires lock(s) on the written record(s).
- Log a “prepare” record via Paxos, to replicate lock and new value
- Tell TC it is prepared Or tell TC “no” if