6.824-Lab1 详细记录_6.824 lab1建几个worker-CSDN博客

本文链接：https://blog.csdn.net/qq_40686526/article/details/126308562

可用工具

go 语言
- goroutine
- 信道 chan
- waitgroup
- mutex 互斥锁
MapReduce 计算模型

目标

实现一个分布式的 MapReduce 框架，这个框架由两个程序组成，分别是 coordinator 和 worker。在执行一个 MapReduce 任务时，只会有一个 coordinator 进程和多个 worker 进程。要求 coordinator 和 worker 之间通过 RPC 交流。workers 会通过 RPC 向 coordinator 请求任务，之后从本地文件读取任务的输入，执行任务，然后把任务的输出写入本地文件当中。如果一个 worker 没有在一定的时间内完成任务（10s），coordinator 应当注意到并且把相同的任务派发给另一个 worker。
该框架的实现应当补充到 mr/coordinator.go、 mr/worker.go 和 mr/rpc.go里。

框架设计

首先，先设计整体框架结构。根据给定的代码结构，该框架主要由两个组件构成，分别是 coordinator 和 worker 。其中 coordinator 负责任务的派发，协调各 worker 等。worker 负责执行请求到的任务，读取相关文件并将结果写入相应的文件当中。

RPC 设计

根据需求，可以总结出框架共需要两种 RPC 调用，分别是 请求任务 以及 任务完成通知。

两种 RPC 调用的请求体和返回体分别需要包括的内容：

请求任务
- 请求体：
  - （无内容）
- 返回体：
  - 任务类型（Map 或者 Reduce）
  - 需要处理的文件名
  - 如果是 Map 任务，需要提供 Reduce 任务的数目以确定输出文件的名称（shuffle）。
  - 如果是 Map 任务，需要提供当前输入文件的索引，在任务完成时返回来方便 coordinator 管理。
  - 如果是 Reduce 任务，需要提供当前 Reduce 任务的索引来确定输出文件的名称。
任务完成通知
- 请求体：
  - 完成任务的类型（Map 或者 Reduce）
  - 完成任务的索引
- 返回体：
  - （无内容）

Coodinator

coordinator 需要负责：

派发未完成的任务，以及当任务超时时重新派发
接收任务完成的通知并记录，当所有 Map 任务完成时进行 Reduce 任务的派发。当所有 Reduce 任务完成时，在给定的 Done 方法中返回true。

那么，coordinator 需要保存以下信息：

待处理文件的文件名
map 任务状态集合，记录所有 map 任务的完成状态
reduce 任务状态集合，记录所有 reduce 任务的完成状态
map 阶段任务是否全部完成
reduce 阶段任务是否全部完成

派发任务的流程可以表示为：

任务处理超时重发的流程可以表示为：

接收任务完成通知的流程可以表示为：

Worker

worker 的职责是处理领取到的任务，并把处理结果写入相应的文件当中。worker 会得到 map 函数和 reduce 函数的实现，领取到任务之后需要判断任务类型调用相应的函数，并写入相应的输出文件当中。
worker 的工作流程可以表示为：

代码实现

RPC

根据 RPC 设计内容，编写以下 RPC 中间定义：

// Add your RPC definitions here.
// 请求任务
type GetTaskRequest struct{}

type GetTaskResponse struct {
	TaskType  int      // 任务类型 0-map 1-reduce
	Filenames []string // 待处理的文件名，有可能是输入文件也可能是中间文件

	// for map
	MapIndex  int // map 任务索引
	ReduceNum int // reduce 任务数目

	// for reduce
	ReduceIndex int // reduce 任务索引
}

// 任务完成通知
type NotifyTaskDoneRequest struct {
	TaskType  int // 任务类型 0-map 1-reduce
	TaskIndex int // 任务索引，可以是 map 任务的索引也可以是  reduce 任务的索引
}

type NotifyTaskDoneResponse struct{}

Coordinator

字段

根据设计内容，首先给 Coordinator 定义以下字段：

type Coordinator struct {
	// Your definitions here.
	filenames []string // 输入文件
	mapNum    int      // map 任务数量
	reduceNum int      // reduce 任务数量

	isMapPhaseDone    bool // map 阶段是否完成
	isReducePhaseDone bool // reduce 阶段是否完成

	// masks，记录任务完成状态
	mapDoneMask     []int // 0-not done 1-done
	reduceDoneMask  []int // 0-not done 1-done
	mapDoneCount    int   // map任务完成数量
	reduceDoneCount int   // reduce 任务完成数量

	// channels 事件
	mapPhaseDone chan struct{} // map阶段完成信号
	taskDispatch chan int      // 任务分配信号
	taskDone     chan int      // 任务完成信号

	// mutex
	lock sync.Mutex
}

任务分配

编写主要的任务分配线程，依次分配所有任务：

func (c *Coordinator) dispatchTasks() {
	for i := range c.filenames {
		c.taskDispatch <- i
	}

	<-c.mapPhaseDone // 等待 map 阶段完成

	for i := 0; i < c.reduceNum; i++ {
		c.taskDispatch <- i
	}
}

编写超时处理线程函数：

// 超时等待
func (c *Coordinator) timeout(taskType int, taskIndex int) {
	time.Sleep(10 * time.Second) // 10s
	var done bool
	c.lock.Lock()
	if taskType == 0 {
		done = c.mapDoneMask[taskIndex] == 1
	} else {
		done = c.reduceDoneMask[taskIndex] == 1
	}
	c.lock.Unlock()

	// 任务仍未完成
	if !done {
		c.taskDispatch <- taskIndex
	}
}

编写接收 RPC 请求任务函数，主要逻辑先等待是接收任务分配的信号，再根据信号内容进行分配：

func (c *Coordinator) GetTask(req *GetTaskRequest, resp *GetTaskResponse) error {
	// 等待接收派发任务信号
	taskIndex := <-c.taskDispatch
	if !c.isMapPhaseDone {
		// map 阶段
		resp.TaskType = 0
		resp.MapIndex = taskIndex
		resp.ReduceNum = c.reduceNum
		resp.Filenames = []string{c.filenames[taskIndex]}

		go c.timeout(0, taskIndex)
	} else {
		// reduce 阶段
		resp.TaskType = 1
		resp.ReduceIndex = taskIndex
		resp.Filenames = make([]string, 0, 10)

		// 遍历当前文件夹
		dir, err := ioutil.ReadDir(".")
		if err != nil {
			return err
		}

		for _, fi := range dir {
			if fi.IsDir() { // 目录, 跳过
				continue
			} else {
				// 过滤指定格式
				ok := strings.HasPrefix(fi.Name(), "intermedia") && strings.HasSuffix(fi.Name(), strconv.Itoa(taskIndex))
				if ok {
					resp.Filenames = append(resp.Filenames, fi.Name())
				}
			}
		}

		go c.timeout(1, taskIndex)
	}

	// log.Printf("dispacth task : type : %v, filenames: %v", resp.TaskType, resp.Filenames)

	return nil
}

任务完成通知

首先编写 RPC 任务完成通知函数：

func (c *Coordinator) NotifyTaskDone(req *NotifyTaskDoneRequest, resp *NotifyTaskDoneResponse) error {
	if req.TaskType == 0 && c.isMapPhaseDone {
		// map阶段的冗余通知忽略
		return nil
	}

	// log.Printf("task done: type: %v, taskIndex: %v", req.TaskType, req.TaskIndex)
	c.taskDone <- req.TaskIndex

	return nil
}

再编写处理任务完成信号的函数，主要逻辑是更新一系列状态变量：

func (c *Coordinator) updateTaskStatus() {
	for taskIndex := range c.taskDone {
		c.lock.Lock()
		if !c.isMapPhaseDone && c.mapDoneMask[taskIndex] == 0 {
			// 处理 map 任务
			c.mapDoneMask[taskIndex] = 1
			c.mapDoneCount++
			if c.mapDoneCount == c.mapNum {
				// 如果完成任务数等于总任务数，发出 map 阶段完成的通知，让任务分配线程分配所有的reduce任务
				c.isMapPhaseDone = true
				c.mapPhaseDone <- struct{}{}
			}
			// log.Printf("map mask: %v", c.mapDoneMask)
		} else if c.isMapPhaseDone && c.reduceDoneMask[taskIndex] == 0 {
			// 处理 reduce 任务
			c.reduceDoneMask[taskIndex] = 1
			c.reduceDoneCount++
			if c.reduceDoneCount == c.reduceNum {
				c.isReducePhaseDone = true

				// 删掉所有中间结果文件
				files, err := ioutil.ReadDir(".")
				if err != nil {
					log.Fatal("cannot open current dir")
				}
				for _, f := range files {
					if strings.HasPrefix(f.Name(), "intermedia") {
						if err := os.Remove(f.Name()); err != nil {
							log.Fatal("cannot remove intermedia files")
						}

					}
				}

			}
		}
		c.lock.Unlock()
	}
}

Done函数

lab要求实现这个函数，测试主函数会周期调用该函数查询任务完成状态：

// main/mrcoordinator.go calls Done() periodically to find out
// if the entire job has finished.
func (c *Coordinator) Done() bool {
	c.lock.Lock()
	res := c.isReducePhaseDone
	c.lock.Unlock()
	return res
}

Worker

worker 主要只有一个函数流程，需要注意的是 lab 要求 worker 持续向 coordinator 请求任务，而非完成一个任务之后就退出。

func run(mapf func(string, string) []KeyValue,
	reducef func(string, []string) string) bool {

	// Your worker implementation here.
	// 请求任务
	req := GetTaskRequest{}
	resp := GetTaskResponse{}

	if success := call("Coordinator.GetTask", &req, &resp); !success {
		// 如果 rpc 失败，就认定 coordinator 已退出，任务已完成，此时 map 也退出
		return false
	}

	if resp.TaskType == 0 {
		// Map
		filename := resp.Filenames[0]

		file, err := os.Open(filename)
		if err != nil {
			log.Fatalf("cannot open %v", filename)
		}
		content, err := ioutil.ReadAll(file)
		if err != nil {
			log.Fatalf("cannot read %v", filename)
		}
		file.Close()
		kva := mapf(filename, string(content))

		// 计算所有 key 的 hash 值，放入map中等待存储
		kvmap := make(map[int][]KeyValue)

		for _, kv := range kva {
			hash := ihash(kv.Key) % resp.ReduceNum
			v, ok := kvmap[hash]
			if !ok {
				kvmap[hash] = make([]KeyValue, 0)
			}

			kvmap[hash] = append(v, kv)
		}

		// 存储，根据计算出的 hash 值来确定输出到哪个文件
		for k, v := range kvmap {
			tempFile, err := os.CreateTemp(".", "temp-"+strconv.Itoa(resp.MapIndex)+"-")
			if err != nil {
				log.Fatalf("cannot create temp file %v", tempFile.Name())
			}
			enc := json.NewEncoder(tempFile)
			for _, kv := range v {
				err := enc.Encode(&kv)
				if err != nil {
					log.Fatalf("cannot write data %v", tempFile.Name())
				}
			}

			os.Rename(tempFile.Name(), "intermedia-"+strconv.Itoa(resp.MapIndex)+"-"+strconv.Itoa(k))
		}

		// 通知任务完成
		notifyReq := NotifyTaskDoneRequest{TaskType: 0, TaskIndex: resp.MapIndex}
		notifyResp := NotifyTaskDoneResponse{}
		call("Coordinator.NotifyTaskDone", &notifyReq, &notifyResp)
	} else {
		// Reduce
		kva := make([]KeyValue, 0)

		for _, filename := range resp.Filenames {
			file, err := os.Open(filename)
			if err != nil {
				log.Fatalf("cannot open file %v", filename)
			}
			dec := json.NewDecoder(file)

			for {
				var kv KeyValue
				if err := dec.Decode(&kv); err != nil {
					break
				}
				kva = append(kva, kv)
			}
		}

		sort.Sort(ByKey(kva))

		tempFile, err := os.CreateTemp(".", "temp-")
		if err != nil {
			log.Fatalf("cannot create temp file %v", tempFile.Name())
		}

		for i := 0; i < len(kva); {
			j := i + 1
			for j < len(kva) && kva[j].Key == kva[i].Key {
				j++
			}
			values := []string{}
			for k := i; k < j; k++ {
				values = append(values, kva[k].Value)
			}
			output := reducef(kva[i].Key, values)

			// this is the correct format for each line of Reduce output.
			fmt.Fprintf(tempFile, "%v %v\n", kva[i].Key, output)

			i = j
		}

		os.Rename(tempFile.Name(), "mr-out-"+strconv.Itoa(resp.ReduceIndex))
		notifyReq := NotifyTaskDoneRequest{TaskType: 1, TaskIndex: resp.ReduceIndex}
		notifyResp := NotifyTaskDoneResponse{}
		call("Coordinator.NotifyTaskDone", &notifyReq, &notifyResp)
	}
	return true
}

// main/mrworker.go calls this function.
func Worker(mapf func(string, string) []KeyValue,
	reducef func(string, []string) string) {

	for {
		if !run(mapf, reducef) {
			break
		}
	}
}