课程链接:http://nil.csail.mit.edu/6.824/2020
GitHub链接:https://github.com/fkuner/6.824
文章目录
- Lecture
- Papers
- MapReduce: Simplified Data Processing on Large Clusters
- The Google File System
- The Design of a Practical System for Fault-Tolerant Virtual Machines
- In Search of an Understandable Consensus Algorithm (Extended Version)
- ZooKeeper: Wait-free coordination for Internet-scale systems
- Object Storage on CRAQ
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
- Lab
Lecture
Lecture1 MapReduce
Lecture2 RPC and Threads
- I/O Concurrency
- Parallelism
- Convenience
race、coordination
web crawler execrise
Lecture3 GFS
Master Data:
- filename -> array of chunk handles
- handle ->list of chunk servers
->version
->primary
->lease expiration - log, checkpoint
READ:
- name, offset-master
- M sends the chunk handle, list of servers which is cached by the client
- client- CS and CS return data to the client
WRITE:
NO PRIMARY -> find up-to-date replicas -> pick primary, secondary -> increment v -> tells p, s -> master writes VA to disk
Lecture4 Primary-Backup Replication
Failures:
- fail-stop faults √
- bugs ×
State transfer (memory)
Replicated State Machine (operation)
Non-def events:
- Inputs-packet-data + interrupt
- Weird instructions
- Multicore
Log entry: instruction #, type, data
Lecture5 Go Threads and Raft
condition variables:
mu.Lock()
// do something that might affect the condition
cond.Braodcast()
mu.Unlock()
---
mu.Lock()
while condition == false {
cond.Wait()
}
// now condition is true, and we have the lock
mu.Unlock()
死锁:
s0.CallRequestVote, acquire the lock
s0.CallRequestVote, send RPC to s1
s1.CallRequestVote, acquire the lock
s1.CallRequestVote, send RPC to s0
s0.Handler, s1.Handler trying to acquire lock
Lecture6 Fault Tolerance Raft (1)
Leader -> 高效
ELECTION TIMER -> START ELECTION
term++, requestVote
Lecture7 Fault Tolerance Raft (2)
linearizable system ? ? ?
Lecture8 ZooKeeper
WHY ZK?
- API general-purpose coord service, not library as raft
- N Servers can get N performance?
ZK GUARANTEES
- Linearizable writes: all requests that update the state of ZooKeeper are serializable and respect precedence;
- FIFO client order: all requests from a given client are executed in the order that they were sent by the client.
底层基于ZAB
Lecture9 More Replication CRAQ
Chain Replication
Lecture10 Cloud Replicated DB Aurora
EC2、EBS、RDS
Quorum
Papers
MapReduce: Simplified Data Processing on Large Clusters
The Google File System
The Design of a Practical System for Fault-Tolerant Virtual Machines
通过主从复制,在不同的物理服务器上运行一个备份虚拟机,通过和主机执行相同的操作来保持同步。备份虚拟机被建模为确定性状态机,主从都执行相同的动作,对于不确定性操作,还必须记录事件发生的确切指令。primary VM 接收的所有输入都会通过logging channel传输给backup VM,backup VM根据日志信息进行replay,因此可以和primary VM有相同的执行,从而达到备份的目的。
In Search of an Understandable Consensus Algorithm (Extended Version)
ZooKeeper: Wait-free coordination for Internet-scale systems
the abstraction of a set of data nodes (znodes), organized according to a hierarchical name space.
znode types:
- Regular: Clients manipulate regular znodes by creating and deleting them explicitly;
- Clients create such znodes, and they either delete them explicitly, or let the system remove them automatically when the session that creates them terminates (deliberately or due to a failure).
Client API:
create(path, data, flags)
delete(path, version)
exists(path, watch)
getData(path, watch)
setData(path, data, version)
getChildren(path, watch)
sync(path)
ZooKeeper guarantees
- Linearizable writes: all requests that update the state of ZooKeeper are serializable and respect prece-
dence - FIFO client order: all requests from a given client are executed in the order that they were sent by the
client.
ZooKeeper Implementation
- Request Processor
- Atomic Broadcast
- Replicated Database
- Client-Server Interactions
Object Storage on CRAQ
基于对象存储的系统提供两个原语:
- write(objID, V): The write (update) operation stores the value V associated with object identifier objID
- V := read(objID): The read (query) operation retrieves the value V associated with object id objID.
普通的Chain Replication(CR)维护一个链表,头结点处理写请求,尾结点处理读请求,头结点收到写请求后,会传播到下一个结点,以此类推,直到到达尾节点,此时该操作为committed,所有只有committed的值才能被读到,因此保证了强一致性
CR存在一个问题,只能在尾结点读,吞吐性不高
CRAQ相比于CR拓展了:
- 每一个结点包含一个自增的版本号和额外的属性,指示版本号是clean还是dirty,所有版本号属性初始为clean
- 当一个结点收到一个对象的新版本号(写操作)时,插入最新的版本号到列表中,如果该结点不是尾结点,标记这个版本号为dirty,然后向后传播;否则,标记为clean,这时这个写操作为committed,然后尾结点发送ACK给所有之前的结点
- 当一个结点收到一个对象的ACK,该结点标记这个对象的版本号为clean,并且删除所有之前的版本号
- 当一个结点收到一个对象的读请求时,如果对象的最新版本号是clean, 返回值;否则,该结点向尾结点请求last committed版本号,并且返回这个版本号对应的值
在Read-Mostly Workloads和Write-Heavy Workloads场景都能提供比CR更高的吞吐
CRAQ提供了3种一致性:
- Strong Consistency
- Eventual Consistency
- Eventual Consistency with Maximum-Bounded Inconsistency
CRAQ采取跟CR一样的容错
除此之外,为了提升效率,Node再给后续的节点发送消息时可以用多播协议,来并发处理,缩短了线性处理的时间
Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
为了减少网络压力,提升数据库性能,Aurora将redo log下推到分布式存储层,采用Quorum协议,提供可扩展的高可用存储
Lab
Lab1、Lab2、Lab3代码在GitHub上,Lab4肝不动了,有机会再写吧!