MIT6.824Lab1代码与思路

最新推荐文章于 2024-06-16 17:22:24 发布

eternalex

最新推荐文章于 2024-06-16 17:22:24 发布

阅读量793

点赞数 2

分类专栏： Mit6.824 文章标签： go ubuntu linux bash

本文链接：https://blog.csdn.net/qq_41703198/article/details/125919449

版权

Mit6.824 专栏收录该内容

7 篇文章 1 订阅

订阅专栏

MIT6.824Lab1代码与思路

1.推荐资料
2.实验文件介绍
3.实验流程与思路
4.总结

最近有了点空闲，想着学一下师兄推荐的MIT6.824，go和分布式都是第一次接触，慢慢学慢慢做，记录一下。

1.推荐资料

B站爱学习的阿噜Lab1实验：没接触过go和ubuntu上bash编程以及不会跑示例实验的同学可以看一下这个。
B站课程视频：这个是翻译版本，英文不好的同学可以看这个，而且下面也有分享资料什么的。
博主东东儿的代码原贴：借鉴了很多思路，做这个实验每天都要review人家代码好几遍，写得简洁又漂亮，思路清晰。

2.实验文件介绍

我也是第一次做go和在ubuntu上编程，看懂课程源文件也花了不少时间，不懂实验的同学可以看一下这部分，可以让实验简单很多。
main/mrsequential.go:这个是官方给的一个单机版map-reduce程序，不熟悉go的同学可以借鉴里面main（）函数中有map和reduce的文件创建读写以及函数调用等代码。
main/mrworker.go和main/mrcoordinator.go:这个是官方给的调用程序，按照规则这里是不许更改的，不过看懂可以更清楚整个程序是如何调用的。
mrapps:这个文件夹下给出了不同测试的mapf和reducef，这里也注意，worker方法并不需要自己写mapf和reducef，人家也不允许，参照main/mrsequential.go中直接进行使用即可
mr/coordinator.go:这里是master的结构体与方法，一开始还以为是master通过coordinator与worker交流，后来才发现是课程lab版本不一样，master就是coordinator，本文使用coordinator
mr/rpc.go:定义rpc结构体，用以worker与coordinator通讯
mr/coordinator.go:定义worker的方法，这里要注意main/mrsequential.go很多方法看起来像系统方法（如ByKey）其实是要自己实现接口的
main/test-mr.sh测试文件，每个部分之间用#隔开了，如果某一部分不通过可以将其他部分注释然后单独跑，此外这个文件会删除产生的中间件，如果有debug需要，也可以注释掉相关代码（如rm -f mr-out*这些）

3.实验流程与思路

3.1实验思路

bash运行mrcoordinator.go创建coordinator，然后运行mrworker.go创建数个worker，worker通过rpc通讯找coordinator要任务，coordinator返回map任务，worker报告完成，完成所有map任务后，worker要任务，coordinator返回reduce任务，worker完成后报告给coordinator再进行下一项测试
要点：

coordinator在分发完所有map任务后，需要等待所有map任务都提交再分发reduce任务。因为可能会有reduce读取未完成的map文件产生冲突，而且按照官方思路，reduce应当是交错读。如果你有合适的方法对已经完成的map任务进行标记不出错，不进行等待也可以，但是等待比较简单。
输出文件和中间件最好都按照官方建议，不过对bash编程很熟悉可以按照测试文件代码要求来写。
设计数据结构记录coordinator目前状态以及map和reduce的进展。
编写代码时可以先写出大体架构，跑通后再添加细节。
通过定义const让代码更间接

3.2RPC结构体

用以与coordinator通讯，这里建议把申请与报告任务结束分开写，可以让整个思路更清晰，有些字段让我用作不同的地方了，其实这里写的有点粗糙，按理说不应该这样设计。其中map与reduce的任务申请和reduce的任务结束不需要传递信息。

package mr
// RPC definitions.
// remember to capitalize all names.
import "os"
import "strconv"
// example to show how to declare the arguments
// and reply for an RPC.
//get task rpc 
type GetTaskRequest struct {
	Index int
}
type GetTaskReply struct {
	//State 0 sleep 1 map 2 reduce 3 finish
	TaskType int
	// file name to store Map result,used both in map and reduce
	TaskName string
	// reduce worker num
	ReduceNum int 
	// num of this worker
	Index int
	InputName []string
	//OutputName []string
}
//response task rpc
type FinishTaskRequest struct {
	TaskType int
	TaskName string
	FileName  []string
}
type FinishTaskReply struct {
	Repltype int
}
// Cook up a unique-ish UNIX-domain socket name
// in /var/tmp, for the coordinator.
// Can't use the current directory since
// Athena AFS doesn't support UNIX-domain sockets.
func coordinatorSock() string {
	s := "/var/tmp/824-mr-"
	s += strconv.Itoa(os.Getuid())
	return s
}

3.3coordinator结构体

设const或者用枚举方法可以优化代码，此外Coordinator中有得参数没有用到，但我懒得改了

const (
	Sleep=iota
	Map
	Reduce
	Finish
)
const (
	Working=iota
	Timeout
)
const (
	NotStarted=iota
	Processing
	Finished
)
type Coordinator struct {
	// Your definitions here.
	// Num of Worker for reduce proess
	NumReduceWorker int
	// Num of input file part
	NumMapFile int
	//State 0 sleep, 1 map, 2 reduce, 3 finish 
	State int
	//map任务产生的中间件存储 
	ReduceRecord map[int]string
	//List for record which file is busy
	Mapfiles map[string]int
	Reducefiles map[string]int
	Mux sync.Mutex
	//记录现有map任务的数量
	MapTaskNum int
}

3.4coordinator功能实现

coordinator所有可能访问共享代码，结构，文件，数据的内容都需要加锁，而且由于coordinator代码运行很快，一个偷懒的做法是所有的方法都加锁，除了部分代码需要等待一段时间。
获取任务代码，先加锁，然后判断目前的状态，最后根据不同状态返回任务需求，并记录以及开启超时等待

func (c *Coordinator) GetTask(args *GetTaskRequest, reply *GetTaskReply) error {
	c.Mux.Lock()
	defer c.Mux.Unlock()
	if c.State == Sleep{
		reply.TaskType = Sleep
		return nil
	}else if c.State == Finish{
		reply.TaskType = Finish
		return nil
	}else if c.State == Map{
		reply.TaskType = Map
		//给worker一个编号
		if args.Index == -1{
			reply.Index = c.MapTaskNum
			c.MapTaskNum++
		}
		// map 的逻辑，查看哪个文件是空闲的，分发给worker，
		for k,v := range c.Mapfiles{
			if v == NotStarted{
				reply.TaskName = k
				reply.ReduceNum = c.NumReduceWorker
				// 记录该文件正在工作，启动超时等待
				c.Mapfiles[k] = Processing
				go c.HandleTimeOut(Map,k)
				return nil
			}
		}
		//没有需要进行的map任务，返回sleep
		reply.TaskType = Sleep
		return nil
	}else if c.State == Reduce{
		reply.TaskType = Reduce
		for k,_ := range c.Reducefiles{
			if c.Reducefiles[k] == NotStarted{
				// reduce 的逻辑，获取需要reduce的文件名
				tempname,_ := strconv.Atoi(k)
				files := strings.Split(c.ReduceRecord[tempname], " ") 
				c.Reducefiles[k] = Processing
				reply.InputName = files
				go c.HandleTimeOut(Reduce, k)
				return nil
			}
		}
		reply.TaskType = Sleep
		return nil
		// 记录正在reduce的文件，启动超时等待
	}else{
		log.Fatal("wrong state")
	}
	return nil
}

报告任务代码，流程同申请任务，不过要加一步判断当前map/reduce过程是否已经完成，完成了就进入下一阶段

func (c *Coordinator) ResponseTask(args *FinishTaskRequest, reply *FinishTaskReply) error {
	c.Mux.Lock()
	defer c.Mux.Unlock()
	if args.TaskType == Map{
		if c.Mapfiles[args.TaskName] == Processing{
		c.Mapfiles[args.TaskName] = Finished
		// 存储中间件文件名
		for _,v := range args.FileName{
			tempstring := strings.Split(v, "-")
			index := tempstring[len(tempstring)-1]
			temp, err := strconv.Atoi(index)
			if err != nil {
				return nil
			}
			if c.ReduceRecord[temp] != ""{
				c.ReduceRecord[temp] = c.ReduceRecord[temp]+ " " + v
			}else{
				c.ReduceRecord[temp] = v
			}
		}
		//检查map过程是否已经结束
		flag := true
		for _,v:=range c.Mapfiles{
			if v == NotStarted || v == Processing{
				flag = false
			}
		}
		if flag == true{
			c.State = Reduce
		}		
		}else{
			return nil
		}
	}else if args.TaskType == Reduce{
		index := args.FileName[0][strings.LastIndex(args.FileName[0],"-")+1:]			
		if c.Reducefiles[index] == Processing{
			c.Reducefiles[index] = Finished
			//检查reduce过程是否已经结束
			flag := true
			for _,v:=range c.Reducefiles{
				if v == NotStarted || v == Processing{
					flag = false
				}
			}
			if flag == true{
				c.State = Finish
			}
		}else{
			return nil
		}	
	}
	return nil
}

超时代码，启动一个worker后启动该线程，等待10s后查看对应的worker是否成功汇报，没有则将该任务设为未进行状态，其实查看crash.go代码后感觉这里可能会出现写文件冲突的情况，不过即使被复写了应该也不会影响最终结果，只是可能会有写冲突。总之超时代码不太严谨，但是额外再加一个锁给文件太复杂了，所以就先这样。

func (c *Coordinator) HandleTimeOut(TaskType int,TaskName string) error{
	time.Sleep(time.Second*10)
	c.Mux.Lock()
	defer c.Mux.Unlock()
	if TaskType == Map{
		if c.Mapfiles[TaskName] != Finished{
			c.Mapfiles[TaskName] = NotStarted
		}
	}else if TaskType == Reduce{
		if c.Reducefiles[TaskName] != Finished{
			c.Reducefiles[TaskName] = NotStarted
		}
	}
	return nil
}

Done方法一直查询coordinator状态，结束任务后返回true

func (c *Coordinator) Done() bool {
	c.Mux.Lock()
	defer c.Mux.Unlock()
	if c.State == Finish{
		return true
	}
	return false

	// Your code here.
}

3.4worker功能实现

功能代码，这部分mrsequential.go也用到了，不是系统自带的需要自己写，相当于ByKey实现了一个sort的排序接口。

// Map functions return a slice of KeyValue.
type KeyValue struct {
	Key   string
	Value string
}
type ByKey []KeyValue
func (a ByKey) Len() int           { return len(a) }
func (a ByKey) Swap(i, j int)      { a[i], a[j] = a[j], a[i] }
func (a ByKey) Less(i, j int) bool { return a[i].Key < a[j].Key }
// use ihash(key) % NReduce to choose the reduce
// task number for each KeyValue emitted by Map.
func ihash(key string) int {
	h := fnv.New32a()
	h.Write([]byte(key))
	return int(h.Sum32() & 0x7fffffff)
}

worker功能代码，不确定worker是否复用，所以让他一直循环，用call方法申请任务，根据任务信息调用相应的方法，执行结束后进行汇报。

func Worker(mapf func(string, string) []KeyValue,
	reducef func(string, []string) string) {
	for {
		args := GetTaskRequest{}
		args.Index = -1
		reply := GetTaskReply{}
		call("Coordinator.GetTask",&args,&reply)
		if reply.TaskType == Sleep{
			time.Sleep(time.Millisecond*10)
		}else if reply.TaskType == Finish{
			return
		}else if reply.TaskType == Map{
			// map function
			filenames := make([]string,reply.ReduceNum)
			filenames = HandleMap(mapf,reply.TaskName,reply.ReduceNum,reply.Index)
			// report to coordinator
			report := FinishTaskRequest{}
			reportReply := FinishTaskReply{}
			report.FileName = filenames
			report.TaskType = reply.TaskType
			report.TaskName = reply.TaskName
			call("Coordinator.ResponseTask",&report,&reportReply)

		}else if reply.TaskType == Reduce{
			// reduce function
			oname := HandleReduce(reducef,reply.InputName)

			// report to coordinator
			report := FinishTaskRequest{}
			reportReply := FinishTaskReply{}
			report.FileName = append(report.FileName,oname)
			report.TaskType = reply.TaskType
			call("Coordinator.ResponseTask",&report,&reportReply)
		}else{
			log.Fatal("error : unknow TaskType")
		}
	}
}

map部分代码，基本就是加工了一下官方代码，这里要注意中间件的存储名称与顺序，和后续reduce对得上。

func HandleMap(mapf func(string,string)[]KeyValue,filename string,reducenum int,mapnum int)[]string{

	//read input file and do map process
	intermediate := []KeyValue{}

	file, err := os.Open(filename)
	if err != nil{
		log.Fatalf("cannot open %v", filename)
	}
	content, err := ioutil.ReadAll(file)
	if err != nil {
		log.Fatalf("cannot read %v", filename)
	}
	file.Close()
	kav := mapf(filename,string(content))
	intermediate = append(intermediate,kav...)

	//create each json file if not exist
	filenames := make([]string,reducenum)
	basicname := "mr-" + strconv.Itoa(mapnum) + "-"
	files := make([]*os.File,reducenum)
	for i := 0 ; i < reducenum ; i++{
		filenames[i] = basicname + strconv.Itoa(i)
		files[i],_ = os.Create(filenames[i])
	}
	//write map result,**using lock here
	for _, kv := range intermediate{
		index := ihash(kv.Key)%reducenum
		enc := json.NewEncoder(files[index])
		enc.Encode(&kv)
	}
	return filenames
}

reduce部分代码，也是加工了一下官方代码，这里抄官方代码的时候要注意多了一步文件读写，可以通过rpc传递中间件名称或者自己按照某种约定写死，总之不要出错就可。

unc HandleReduce(reducef func(string, []string) string,filenames []string)string{
	//read input file and sort all of them
	files:= make([]*os.File,len(filenames))
	intermediate := []KeyValue{}
	for i:=0;i<len(files);i++{
		file,err := os.Open(filenames[i])
		if err != nil {
			log.Fatalf("cannot open %v", filenames[i])
		}
		files[i] = file
		kv := KeyValue{}
		dec := json.NewDecoder(files[i])
		for{
			err:=dec.Decode(&kv)
			if err!=nil{
				break
			}
			intermediate = append(intermediate,kv)
		}
	}
	sort.Sort(ByKey(intermediate))
	//create output file
	oname := "mr-out-"
	index:=filenames[0][strings.LastIndex(filenames[0],"-")+1:]
	oname=oname+index
	if oname == "mr-out-0"{
	}
	ofile, _ := os.Create(oname)
	//write map result,**using lock here
	i := 0
	for i < len(intermediate) {
		j := i + 1
		for j < len(intermediate) && intermediate[j].Key == intermediate[i].Key {
			j++
		}
		values := []string{}
		for k := i; k < j; k++ {
			values = append(values, intermediate[k].Value)
		}
		output := reducef(intermediate[i].Key, values)
		// this is the correct format for each line of Reduce output.
		fmt.Fprintf(ofile, "%v %v\n", intermediate[i].Key, output)
		i = j
	}
	return oname
}

4.总结

做完之后回顾感觉其实挺简单的但是如果不熟悉go和bash以及ubuntu的同学可能上手真的难受，这个只能自己花时间熟悉了。
我用的test-mr.sh这个测试文件进行代码调试，主要通过在关键节点打印字符确定自己错在哪了，主要是自己还不太会go的调试，有能力的同学可以自己想办法调试。没有办法的同学可以试试打印字符的方法进行debug，其实也挺好用的。
有可能会有如下的错误，是go版本的原因无视就好。
官方说不要贴源码，我就不贴了，主要代码都在这而且自己写的也雀食不好看，有需要的同学可以看博主东东儿的代码，其实挺多地方看不到全部源码还是挺难理解的，只能说这个东西就是会的不难难的不会。
最后也是庆祝一下自己终于跑通，从装环境开始弄了有一个星期，一开始什么都一筹莫展最后还是跑通了。