6.824 lab2

2A是写投票和心跳,照着figure2写

主要逻辑都写在ticker函数中。两种状态,leader,非leader。
leader:1睡眠50ms;2检查领导状态;3心跳。
非leader:1睡眠150-300ms;2检查心跳;3发起投票;4睡眠100ms;5检查候选人状态与投票情况

投票都是要在新term;注意锁的开关,睡眠前后、调用函数前后;print调试法;空白appendEntry;睡眠时间
要处理好:主崩了,从要接任;主还在,从不篡位。

go test -run 2A -race
Test (2A): initial election ...
  ... Passed --   3.0  3  106   27448    0
Test (2A): election after network failure ...
  ... Passed --   4.9  3  240   48758    0
Test (2A): multiple elections ...
  ... Passed --   6.5  7 1032  201534    0
PASS
ok      6.824/raft      14.522s

2B要比2A难得多,2B不仅要写log、commit,还需要完善投票部分。可能还会导致2A不通过。需要把figure2所有都实现。是通过Start函数与ApplyMsg来检查是否正确。

leader:更新commitIndex,进行commit。心跳时考虑是否需要同步

一个term最多只有一个leader;log只需要比较index和term;收到appendEntry,比较term考虑是否变为follower;index从1开始;睡眠先后解开锁,醒后需检查状态;nextIndex、matchIndex的值变化,考虑nextIndex过大/失联;
leader只会commit当前term,而且不主动分发之前term的内容,等到当前term有内容再分发。注意心跳的commitIndex
投票时,判断a和b谁更新,先看a b最后一个日志的term,然后再比较index

go test -run 2B -race
Test (2B): basic agreement ...
  ... Passed --   0.8  3   16    3828    3
Test (2B): RPC byte count ...
  ... Passed --   1.8  3   46  112212   11
Test (2B): agreement despite follower disconnection ...
  ... Passed --   6.0  3  186   45330    8
Test (2B): no agreement if too many followers disconnect ...
  ... Passed --   3.5  5  276   53898    4
Test (2B): concurrent Start()s ...
  ... Passed --   0.7  3   14    3344    6
Test (2B): rejoin of partitioned leader ...
  ... Passed --   6.2  3  278   64214    4
Test (2B): leader backs up quickly over incorrect follower logs ...
  ... Passed --  26.0  5 3048 2313497  104
Test (2B): RPC counts aren't too high ...
  ... Passed --   2.3  3   64   16584   12
PASS
ok      6.824/raft      48.405s

2C实现存取。比前两个简单多了
实现存、读函数,在每个改变的地方调用存函数
Figure 8 (unreliable)一开始没有通过,通过查看Dprintf发现,nextIndex一直匹配不到,根据题目的最后一个hint改进

go test -run 2C -race
Test (2C): basic persistence ...
labgob warning: Decoding into a non-default variable/field int may not work
  ... Passed --   3.8  3  118   25653    6
Test (2C): more persistence ...
  ... Passed --  16.4  5 1384  267718   16
Test (2C): partitioned leader and one follower crash, leader restarts ...
  ... Passed --   1.8  3   44    9794    4
Test (2C): Figure 8 ...
  ... Passed --  43.3  5 1476  293394   25
Test (2C): unreliable agreement ...
  ... Passed --   6.3  5  412  114070  246
Test (2C): Figure 8 (unreliable) ...
  ... Passed --  49.0  5 6684 11634487  583
Test (2C): churn ...
  ... Passed --  16.4  5 1080  483804  255
Test (2C): unreliable churn ...
  ... Passed --  16.6  5  768  192555   96
PASS
ok      6.824/raft      155.078s

2D实现快照模块。最复杂的部分,之前的代码都得修改
先考虑偏移,设定X,通过测试ABC
根据图13实现InstallSnapshotRPC。如果安装成功 修正nextIndex matchIndex
AppendEntriesRPC、C中的加速nextIndex、心跳、commit、投票均需重新设计,是个体力活
当队列中没有需要commit的内容:可以使用applyCh上传快照,快照先前的所有内容均被上传
当队列中没有需要append的内容:调用InstallSnapshotRPC
当队列为空 需要投票时:需要比较lastIncluded
2C中对nextIndex跳转进行了修改,这里需要再次修改,考虑InstallSnapshotRPC与AppendEntriesRPC
需要把快照保存

go test -run 2D -race
Test (2D): snapshots basic ...
  ... Passed --   4.6  3  142   44750  251
Test (2D): install snapshots (disconnect) ...
  ... Passed --  61.2  3 2066  491612  343
Test (2D): install snapshots (disconnect+unreliable) ...
  ... Passed --  92.2  3 3134  705353  386
Test (2D): install snapshots (crash) ...
labgob warning: Decoding into a non-default variable/field int may not work
  ... Passed --  31.3  3  926  236012  344
Test (2D): install snapshots (unreliable+crash) ...
  ... Passed --  42.1  3 1194  285557  346
PASS
ok      6.824/raft      232.538s

觉得2A 2B比较有意思,是raft中的核心。2C比较简单,可以一做。2D很麻烦,属于raft的拓展,也没那么多巧妙的设计。
raft的index从1开始,要时刻想着应该+1 还是 -1。
调试过程中 经常调好X,Y又出问题。或者偶尔Z出问题,又很难复现,也猜不出原因。
总体难度还是很大。

go test -race
Test (2A): initial election ...
  ... Passed --   3.6  3  104   25004    0   
Test (2A): election after network failure ...
  ... Passed --   5.0  3  166   30642    0
Test (2A): multiple elections ...
  ... Passed --   6.3  7  690  130308    0
Test (2B): basic agreement ...
  ... Passed --   0.8  3   14    3344    3
Test (2B): RPC byte count ...
  ... Passed --   1.8  3   46  111596   11
Test (2B): agreement despite follower disconnection ...
  ... Passed --   6.0  3  172   39892    8
Test (2B): no agreement if too many followers disconnect ...
  ... Passed --   3.8  5  252   50384    3
Test (2B): concurrent Start()s ...        
  ... Passed --   1.2  3   22    5328    6 
Test (2B): rejoin of partitioned leader ...
  ... Passed --   4.7  3  208   43486    4
Test (2B): leader backs up quickly over incorrect follower logs ...
  ... Passed --  19.9  5 2344 1800172  102
Test (2B): RPC counts aren't too high ... 
  ... Passed --   2.7  3   70   17168   12
Test (2C): basic persistence ...
labgob warning: Decoding into a non-default variable/field int may not work
  ... Passed --   4.5  3  114   25071    6
Test (2C): more persistence ...
  ... Passed --  17.3  5 1360  271270   16
Test (2C): partitioned leader and one follower crash, leader restarts ...
  ... Passed --   2.1  3   42    9234    4
Test (2C): Figure 8 ...
  ... Passed --  48.2  5 1168  221631   17
Test (2C): unreliable agreement ...
  ... Passed --   7.4  5  428  116286  246
Test (2C): Figure 8 (unreliable) ...
  ... Passed --  54.7  5 7148 24082212  884
Test (2C): churn ...
  ... Passed --  16.5  5 1460 1470495  158
Test (2C): unreliable churn ...
  ... Passed --  16.4  5 1068  357124  279
Test (2D): snapshots basic ...
  ... Passed --   5.2  3  144   45522  251
Test (2D): install snapshots (disconnect) ...
  ... Passed --  47.5  3 1612  384681  377
Test (2D): install snapshots (disconnect+unreliable) ...
  ... Passed --  60.2  3 1996  457602  348
Test (2D): install snapshots (crash) ...
  ... Passed --  34.5  3  928  238636  377
Test (2D): install snapshots (unreliable+crash) ...
  ... Passed --  40.2  3 1104  274032  399
PASS
ok      6.824/raft      411.847s
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值