DS-lab2

最新推荐文章于 2024-09-16 11:04:46 发布

原创最新推荐文章于 2024-09-16 11:04:46 发布

· 690 阅读

1 ·

版权

文章标签：

#mit-6-824

云计算专栏收录该内容

8 篇文章

订阅专栏

lab2

lab2

Part A: The Viewservice

注意点

Hint: you’ll want to add field(s) to ViewServer in server.go in order to keep track of the most recent time at which the viewservice has heard a Ping from each server. Perhaps a map from server names to time.Time. You can find the current time with time.Now().

Hint: add field(s) to ViewServer to keep track of the current view.

Hint: you’ll need to keep track of whether the primary for the current view has acknowledged it (in PingArgs.Viewnum).

Hint: your viewservice needs to make periodic decisions, for example to promote the backup if the viewservice has missed DeadPings pings from the primary. Add this code to the tick() function, which is called once per PingInterval.

Hint: there may be more than two servers sending Pings. The extra ones (beyond primary and backup) are volunteering to be backup if needed.

Hint: the viewservice needs a way to detect that a primary or backup has failed and re-started. For example, the primary may crash and quickly restart without missing sending a single Ping.

Hint: study the test cases before you start programming. If you fail a test, you may have to look at the test code in test_test.go to figure out the failure scenario is.

实践注意点

在tick中实现change view操作
在ping中记录好状态，pingtime, idel server， primary ack等。其返回给client的状态，可以是旧的view。不一定得是最新的view，等client重试获得新的view。

Part B: The primary/backup key/value service

步骤

You should start by modifying pbservice/server.go to Ping the viewservice to find the current view. Do this in the tick() function. Once a server knows the current view, it knows if it is the primary, the backup, or neither.
Implement Get, Put, and Append handlers in pbservice/server.go; store keys and values in a map[string]string. If a key does not exist, Append should use an empty string for the previous value. Implement the client.go RPC stubs.
Modify your handlers so that the primary forwards updates to the backup.
When a server becomes the backup in a new view, the primary should send it the primary’s complete key/value database.
Modify client.go so that clients keep re-trying until they get an answer. Make sure that you include enough information in PutAppendArgs, and GetArgs (see common.go) so that the key/value service can detect duplicates. Modify the key/value service to handle duplicates correctly.
Modify client.go to cope with a failed primary. If the current primary doesn’t respond, or doesn’t think it’s the primary, have the client consult the viewservice (in case the primary has changed) and try again. Sleep for viewservice.PingInterval between re-tries to avoid burning up too much CPU time.
原文

注意点

Hint: you’ll probably need to create new RPCs to forward client requests from primary to backup, since the backup should reject a direct client request but should accept a forwarded request.

Hint: you’ll probably need to create new RPCs to handle the transfer of the complete key/value database from the primary to a new backup. You can send the whole database in one RPC (for example, include a map[string]string in the RPC arguments).

Hint: the state to filter duplicates must be replicated along with the key/value state.

Hint: the tester arranges for RPC replies to be lost in tests whose description includes “unreliable”. This will cause RPCs to be executed by the receiver, but since the sender sees no reply, it cannot tell whether the server executed the RPC.

Hint: you may need to generate numbers that have a high probability of being unique. Try this:

import "crypto/rand"
import "math/big"
func nrand() int64 {
  max := big.NewInt(int64(1) << 62)
  bigx, _ := rand.Int(rand.Reader, max)
  x := bigx.Int64()
  return x
}

Hint: the tests kill a server by setting its dead flag. You must make sure that your server terminates correctly when that flag is set, otherwise you may fail to complete the test cases.

Hint: even if your viewserver passed all the tests in Part A, it may still have bugs that cause failures in Part B.

Hint: study the test cases before you start programming

实践时的注意点

1. commit 时机的选择

简单的做法就是每次update都primary不断重试backup，直到成功再返回。

如图所示，需要考虑以下几种情况：
1. Primary丢来自client的包，此时client重试即可
2. Backup丢来自Primary的包，此时Primary重试，（1）直到Backup成功，（2）或者Backup挂掉了（此时可以通过viewServer或者最新的view得知）
3. Backup完成update操作，但是丢掉了回给Primary的包，此时Primary同2一样重试。（需要注意这个时候Backup完成了update操作，下次给它发同一个update操作，需要Backup有duplicate判断逻辑，防止重复执行）
4. Primary回给client的包丢失，此时client重试，Primary按正常逻辑执行，但是需要判断是否是同一个update操作的逻辑。

对于判断重复逻辑，简单的做法就是给每个请求带一个id，server端记录下所有id。这样就可以判断是否重复来同一个update操作。这个id的生成得尽量少碰撞。生成方式在注意点中提到了。
复杂点的做法是两阶段提交协议
彻底解决就要paxos一致性协议来解决

2. 避免partition

防止old primary还提供查询或者更新功能
1. get/update操作都需要primary给backup发送一份，backup返回成功后，本地执行成功才能返回client。
2. 当backup认为自己是主的时候，就会返回ErrWrongServer给Primary， Primary此时可以将这个错误返回给client，让client重新获得新的view，然后再向新的Primary发起请求。