6.824笔记

项目配置

目录结构为

6.824

src

kvraft
labgo

把project go_path设置为6.824这样src内的各个包就能相互导入。

mapreduce 流程

在这里插入图片描述

input files stored in dfs are splited into M pieces. the master worker assign tasks(map or reduce) to other workers.
a map worker parse key/value pairs out of its corresponding piece. then the worker passes each pair to a user-defined map function and produce intermediate key/value pair which is first bufferd in main memory.

those buffer will be periodically flush into local disk then be sorted and partitioned into R regions (one for each reduce task) based
on the reduce key values; the region information is forwarding to the master which is responsible for notifying reduce workers to fetch files using remote procedure calls. note that intermediate pairs corresponding to the same key may be scattered in different disk of map workers.

when reduce worker has read all intermediate data, it sort it by intermediate key. whenever it encounter a new intermediate key, it passes all pairs corresponding to this key to a reduce function. the ouput is appended to a output file for this reduce partition in dfs .

  • RPC 实现
    这里用本地的channel来模拟一个网络,并在这个虚拟网络(Network)的基础上实现RPC

    type Network struct {
    	mu             sync.Mutex
    	reliable       bool
    	longDelays     bool                        // pause a long time on send on disabled connection
    	longReordering bool                        // sometimes delay replies a long time
    	ends           map[interface{}]*ClientEnd  // ends, by name
    	enabled        map[interface{}]bool        // by end name
    	servers        map[interface{}]*Server     // servers, by name
    	connections    map[interface{}]interface{} // endname -> servername
    	endCh          chan reqMsg
    	done           chan struct{} // closed when Network is cleaned up
    	count          int32         // total RPC count, for statistics
    }
    

ends用来记录所有客户端,servers用来记录服务器,两个map都用字符串进行检索。connections记录客户端和服务端的连接,enables记录这个连接是否可用。
所有客户端打包好的请求都通过endch传递给Network,即每一个客户端内部都引用了同一个endch。
Network启动之后,通过循环执行select检查endch中是否有新的请求。拿到req之后,通过其中的endname知道发送者, 再在connectiions中查找得到接受者。

type reqMsg struct {
	endname  interface{} // name of sending ClientEnd
	svcMeth  string      // e.g. "Raft.AppendEntries"
	argsType reflect.Type
	args     []byte
	replyCh  chan replyMsg
}

客户端在调用call后会新建一个reqMsg结构体,然后再通过endch将其传递给Network, Network将处理的二进制结果通过reqMsg中的replyCh管道返回给Call函数,然后call函数按照传入的reply类型将二进制结果反序列化并返回给调用者。
MakeService(): 将一个类变成一个服务
MakeServer() : 创建一个服务器
AddService(svc): 向服务器中添加一个服务。
Network有两个可以控制的属性,如果reliable为False,那么网络会随机丢掉一部分req并返回空reply. 如果longreordering为true, 那么会延后返回reply,造成reply的乱序。

Network中每一个负责处理req的协程的主循环中,利用case <-time.After(100 * time.Millisecond)实现每一百毫秒轮询一次当前连接是否已经dead,如果dead就直接返回空reply

  • sequential map reduce
    每个map task 串行执行,执行完所有map task后再串行执行reduce task. 实现方法是定义一个按照串行分配任务的schedule函数,再将这个函数传递给mr.run(),即

    go mr.run(jobName, files, nreduce, func(phase jobPhase) {
    	switch phase {
    	case mapPhase:
    		for i, f := range mr.files {
    			doMap(mr.jobName, i, f, mr.nReduce, mapF)
    		}
    	case reducePhase:
    		for i := 0; i < mr.nReduce; i++ {
    			doReduce(mr.jobName, i, mergeName(mr.jobName, i), len(mr.files), reduceF)
    		}
    	}
    }, func() {
    	mr.stats = []int{len(files) + nreduce}
    })
    
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值