简介
根据MapReduce Paper构造一个MapReduce系统。该系统主要包括master和worker。master主要负责分发任务、处理worker故障;worker主要负责根据map
、reduce
函数读写文件。
思路
- 任务分发:master将需要完成的任务放到通道中,让worker从通道中拿取任务,根据任务类型完成相应的操作。
- 容错:master跟踪每个任务的完成情况,如果一个任务超过一定时间仍未完成,则重新发布该任务。
- 完成情况判断:master直接判断当前目录的目标文件是否存在来判断一个任务是否完成。例如中间文件
mr-X
,和reduce
操作完成后输出的文件mr-out-X
;master开始前要判断和中间文件同名文件是否存在,如存在则删除,避免运行时错误地判断了任务完成。 - 程序退出:master检查所有任务完成后互斥地设置
done = true
,这时mrworker
调用Done()
方法发现任务完成,就能顺利退出;在master退出后,worker在RPC时联系不上master就可以判断所有任务已经完成。 - 避免并发错误:利用
ioutil.TempFile
创建一个名字独有的临时文件、利用os.Rename
原子性地重命名一个文件。
具体实现
以下给出rpc.go
,master.go
,worker.go
三个文件。
rpc.go
rpc.go
定义了master和worker通信的数据结构:
package mr
//
// RPC definitions.
//
// remember to capitalize all names.
//
import "os"
import "strconv"
type TaskRequest struct {
}
type TaskType int
const (
MapTask = 1
ReduceTask = 2
)
type TaskResponse struct {
// if it is a map task, Filename indicates file that need to be mapped, else it is empty string
Filename string
// task type is either map/reduce
TypeOfTask TaskType
// this is the serial number of task
Serial int
// NReduce is for dividing intermediate result into buckets
NReduce int
}
// Cook up a unique-ish UNIX-domain socket name
// in /var/tmp, for the master.
// Can't use the current directory since
// Athena AFS doesn't support UNIX-domain sockets.
func masterSock() string {
s := "/var/tmp/824-mr-"
s += strconv.Itoa(os.Getuid())
return s
}
master.go
master实现如下:
package mr
import (
"log"
"net"
"net/http"
"net/rpc"
"strconv"
"time"
)
import "os"
type Master struct {
// user TaskChannel to deliver task to workers
TaskChannel chan TaskResponse
// done will be true if all task is done
done bool
// sem is to protect done from concurrent read/write
sem chan struct{
}
}
// keep track of task
type TaskTrack struct {
taskResp TaskResponse
startTime time.Time
}
func (m *Master) DispatchTask(request *TaskRequest, response *TaskResponse) error {
// extract a task from channel
// if there is no task available, the thread which calls this function will go to sleep
temp := <-m.TaskChannel
response.Filename = temp.Filename
response.TypeOfTask = temp.TypeOfTask
response.Serial = temp.Serial
response.NReduce = temp.NReduce
return nil
}
//
// main/mrmaster.go calls Done() periodically to find out
// if the entire job has finished.
//
func (m *Master) Done() bool {
ret := false
// read m.done exclusively
<- m.sem
ret = m.done
m.sem <- struct{
}{
}
return ret
}
// task expires after ten seconds
func isExpired(task TaskTrack) bool {
return time.Now().Sub(task.startTime).Seconds() > 10
}
func dispatcher(files []string, nReduce int, m *Master) {
// remove intermediate files in case there is any collision
for i := 0; i < len(files); i++ {
filename := "mr-" + strconv.Itoa(i)
err := os.Remove(filename)
if err != nil && !os.IsNotExist(err) {
log.Fatalf("error occurs while removing file %v", filename)
}
}
var unfinishedTasks []TaskTrack
//-------------------------------------------- dispatch map task --------------------------------------------
for i, file := range</