MIT6.824 MapReduce 实践过程之二(实现Map过程)

  • One way to get started is to modify mr/worker.go's Worker() to send an RPC to the master asking for a task. Then modify the master to respond with the file name of an as-yet-unstarted map task. Then modify the worker to read that file and call the application Map function, as in mrsequential.go.

 

worker首先像master报道,master分配一个id给worker, worker通过RPC不断向master请求任务,master分配任务给worker执行

func Worker(mapf func(string, string) []KeyValue,
	reducef func(string, []string) string) {

	// Your worker implementation here.
	w := worker{}
	w.mapf, w.reducef = mapf, reducef
	w.callRegisterWorker()
	w.running()

}

worker结构体,存放一个编号,以及要执行的map和reduce函数。

type worker struct {
	workerID int
	mapf func(string, string) []KeyValue
	reducef func(string, []string) string
}

worker的通过RPC从master获得一个编号

func (w worker) callRegisterWorker() {
	args := &RegisterArgs{}
	reply := &RegisterReply{}
	if ok := call("Master.RegisterWorker", args, reply); !ok {
		log.Fatal("error: register worker failed.")
	}
	w.workerID = reply.ID

	fmt.Printf("worker %v registered success.\n", w.workerID)

}

worker通过run函数,不断执行像master请求任务并执行

func (w worker) running() {
	for{
		args := &ReqTaskArgs{w.workerID}
		reply := &ReqTaskReply{}
		if ok := call("Master.ReqTask", args, reply); !ok {
			fmt.Printf("worker %v - ", w.workerID)
			log.Fatal("error: request tasks failed")
		}

		// 判断任务类型
		switch reply.ReceivedTask.TaskType {
		case MAPTASK:
			fmt.Println("received a map task")
			w.doMapTask(*reply)
		case REDUCETASK:
			fmt.Println("received a reduce task")
			w.doReduceTask(*reply)
		case NONETASK:
			time.Sleep(time.Second)
			fmt.Println("received a None task")
		case EXITTASK:
			fmt.Println("received an EXIT task")
			os.Exit(0)
		}

	}
}

请求任务参数显然只有一个workerId,以及对应任务的MapTaskID和NReduce的个数

type ReqTaskReply struct{
	ReceivedTask *Task
	NReduce      int
	MapTaskID    int
}

任务结构体

type Task struct {
	TaskType      int    // MapTask-0 ReduceTask-1 NoneTask-2 ExitTask-3
	Filename      string // one filename for one maptask
	FailedWorkers []int  // record workerID who failed to process this task in 10 seconds
	TaskState     int    // Idle-0 Running-1 Finished-2
}

任务类型定义为常量

const (
	// MAPTASK : map task
	MAPTASK int = iota
	// REDUCETASK : reduce task
	REDUCETASK
	// NONETASK : none task tell worker to wait and reqtask latter
	NONETASK
	// EXITTASK : a pseudo-task, tell worker to exit
	EXITTASK
)

然后实现worker的doMap,一个worker将一个文件处理为k-v形式后,将k-v分成n个reduce任务

A reasonable naming convention for intermediate files is mr-X-Y, where X is the Map task number, and Y is the reduce task number.

 

 

The worker's map task code will need a way to store intermediate key/value pairs in files in a way that can be correctly read back during reduce tasks. One possibility is to use Go's encoding/json package. To write key/value pairs to a JSON file:

 

The map part of your worker can use the ihash(key) function (in worker.go) to pick the reduce task for a given key.

 

You can steal some code from mrsequential.go for reading Map input files, for sorting intermedate key/value pairs between the Map and Reduce, and for storing Reduce output in files.

 

  • To test crash recovery, you can use the mrapps/crash.go application plugin. It randomly exits in the Map and Reduce functions.
  •  
  • To ensure that nobody observes partially written files in the presence of crashes, the MapReduce paper mentions the trick of using a temporary file and atomically renaming it once it is completely written. You can use ioutil.TempFile to create a temporary file and os.Rename to atomically rename it.

 

下面实现worker的doMap函数

doMap 打开分配过来的文件,使用mapf函数处理得到kvs数组,然后通过hash函数在处理成nReduce个kvs数组,然后写到文件中。

func (w worker) doMapTask(reply ReqTaskReply) {
	toDoTask := reply.ReceivedTask
	nReduce := reply.NReduce
	mapTaskID := reply.MapTaskID

	// 读取文件
	content, err := ioutil.ReadFile(toDoTask.Filename)
	if err != nil {
		log.Fatalf("error: failed to read file %v when doing map task", toDoTask.Filename)
	}

	// 传入string和content参数后得到kv对构成的切片
	kvs := w.mapf(toDoTask.Filename, string(content))

	// 将切片按照Key分成nReduce个文件,实现中间结果的保存
	interResults := make([][]KeyValue, nReduce)
	for _, kv := range kvs {
		reducerID := ihash(kv.Key) % nReduce
		interResults[reducerID] = append(interResults[reducerID], kv)
	}

	// 将中间结果interResults采用json的方式(文档中的Hints建议)写入本地文件中
	// 先写成临时文件的形式,之后master检查任务后再修改成真正的中间结果文件
	tmpFileName := make([]string, nReduce)
	interFileName := make([]string, nReduce)
	for i, interkvs := range interResults{
		tmpStr := fmt.Sprintf("mr-%d-%d", mapTaskID, i)
		interFileName[i] = tmpStr

		// 先生成临时文件,master确认工作后再改成要的文件名
		file, err := ioutil.TempFile("./", "tmp_map_")
		if err != nil {
			log.Fatal("error: create TempFile failed.")
		}

		tmpFileName[i] = file.Name()
		file.Close()

		// 打开临时文件
		fh, err := os.OpenFile(file.Name(), os.O_APPEND|os.O_RDWR, os.ModePerm)
		if err != nil {
			log.Fatal("error: open tempFiles failed.")
		}
		encoder := json.NewEncoder(fh) // 写入前进行json编码
		// 将interkvs写入该临时文件中
		for _, kv := range interkvs {
			if err := encoder.Encode(kv); err != nil {
				log.Fatal("error: encoding kv failed.")
			}
		}
		// fh.close
		fh.Close()
	}
	// 至此,本maptask的中间结果已经写完,汇报给master,由master决定是否要修改名称
	w.CallNotifyMasterTaskDone(MAPTASK, mapTaskID, w.workerID, tmpFileName, interFileName)
}

任务完成后,通过RPC汇报master,由master决定是否修改名称

func (w worker) CallNotifyMasterTaskDone(taskType int, taskID int, workerID int, tmpFileName []string, interFileName []string) {
	args := &NotifyMasterTaskDoneArgs{taskType, taskID, workerID, tmpFileName, interFileName}
	reply := &NotifyMasterTaskDoneReply{}
	if ok := call("Master.NotifyMasterTaskDone", args, reply); !ok {
		log.Fatal("error: notify master failed.")
	}
	fmt.Printf(":) Worker %v DONE %v type task: %v\n", w.workerID,              taskType, taskID)
}

现在来写master的分配Map任务逻辑,首先master应该有一个Map任务队列,对于我们这个例子,一个文件,对应一个map任务, 因此我们可以先初始化master的信息

type Master struct {
	// Your definitions here.
	numWorkers  int
	mutex       sync.Mutex
	mapTasks    []Task     // 存放map任务
	reduceTasks []Task     // 存放reduce任务
	nReduce     int        // Reduce任务的数量
	taskPhase   int        // 为了提示快速分配对应类型的任务
}

初始化master对象 

func (m *Master) initMaster(files []string, nReduce int){
	m.numWorkers = 0
	m.mutex = sync.Mutex{}
	m.nReduce = nReduce
	m.taskPhase = MAPTASK
	for _, filename :=range files{
		mapTask := Task{MAPTASK, filename,nil, IDLE}
		m.mapTasks = append(m.mapTasks, mapTask)
	}

	for i:=0;i<nReduce;i++{
		reduceTask := Task{REDUCETASK,"",nil,IDLE}
		m.reduceTasks = append(m.reduceTasks, reduceTask)
	}
}

master 接受远程调用服务

func MakeMaster(files []string, nReduce int) *Master {
	m := Master{}
	// Your code here.
	m.initMaster(files,nReduce)
	m.server()
	return &m
}

master 分配任务

func (m *Master) getTask(id int) (*Task, int, int) {
	m.mutex.Lock()
	defer m.mutex.Unlock()
	switch m.taskPhase {
	case MAPPHASE:
		allMapFinished := true
		for i:=0;i<len(m.mapTasks);i++{
			if m.mapTasks[i].TaskState!= FINISHED{
				allMapFinished = false
			}
			if m.mapTasks[i].TaskState == IDLE{
				m.mapTasks[i].TaskState = RUNNING
				return &m.mapTasks[i],m.nReduce,i
			}
		}
		if allMapFinished{
			m.taskPhase = REDUCEPHASE
		}else{
			return &Task{NONETASK, "", nil, IDLE}, m.nReduce, -1
		}
		fallthrough
	case REDUCEPHASE:


	case EXITPHASE:
	
	}
	return &Task{NONETASK, "", nil, 0}, 0, -1
}

master 回复worker的工作确认。

func (m *Master) NotifyMasterTaskDone(args *NotifyMasterTaskDoneArgs, reply *NotifyMasterTaskDoneReply) error{
	m.mutex.Lock()
	defer m.mutex.Unlock()
	switch args.TaskType {
	case MAPTASK:
		m.mapTasks[args.TaskID].TaskState = FINISHED
	}
	return nil
}

测试调试

  发现Map任务执行成功

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值