MIT6.5840/6.824 MapReduce 实验记录

MIT6.5840/6.824 MapReduce 实验记录

总览

  • MapReduce借鉴函数式编程的思想简化在大量数据集下的一些计算问题,符合这个编程模型的问题可以提供简单的Map函数和Reduce函数通过MapReduce库来完成在大量集群下的并行计算,MapReduce库封装了并行计算,故障恢复,数据分区和网络通信等

在这里插入图片描述

example:

单词计数

Map函数:将一个文件内容拆分成一个个k-v对

func Map(filename string, contents string) []mr.KeyValue {
    // function to detect word separators.
    ff := func(r rune) bool { return !unicode.IsLetter(r) }

    // split contents into an array of words.
    words := strings.FieldsFunc(contents, ff)

    kva := []mr.KeyValue{}
    for _, w := range words {
       kv := mr.KeyValue{w, "1"}
       kva = append(kva, kv)
    }
    return kva
}

库中将k-v对进行分区,合并,排序后应用Reduce函数

func Reduce(key string, values []string) string {
    // return the number of occurrences of this word.
    return strconv.Itoa(len(values))
}

实现

总体思路

worker

思想: 每个worker都是既可以处理map task 也可以处理reduce task,不做区分,轮询去去请求coordinator阻塞等待获取任务,如果获取到了任务,执行这个任务,如果没有获取到任务,则睡眠1秒。

  1. 如果是map task,则调用map函数进行处理,将所有健值对根据健进行hash处理,然后将返回结果写入n个桶中(n个文件),返回一个rpc消息表示调用完成。。
  2. 如果是reduce task,则调用reduce函数处理,读取所有文件到内存中,比如处理第i个reduce任务,则读取文件mr-0-i,mr-1-i…mr-len(map task-i)到内存中,然后排序,然后聚集key后调用reduce函数,最后写到mr-out-i 中,返回一个rpc消息表示调用完成。
Coordinator

核心思想:分配map task和reduce task,轮询查询这些task是否可以执行,如果所有任务都已经执行完成则返回,如果有可以执行并且不在队列当中的的task则加入到当前队列当中。

提供两个Rpc服务:

  1. 派发任务

    func (c *Coordinator) AskForATask(request *AskTaskRequest, reply *AskTaskReply) error {
        var taskNum, taskType int
        select {
        case taskNum = <-c.reduceCh:
           taskType = 1
        case taskNum = <-c.mapCh:
           taskType = 0
           reply.FileName = c.FileName[taskNum]
        default:
           taskType = -1
        }
        reply.TaskType = taskType
        reply.TaskNum = taskNum
        reply.ReduceNum = c.reduceTaskNum
        reply.MapNum = c.mapTaskNum
        return nil
    }
    
  2. 接受任务返回信息,并且进行标记,不再重复派发,当所有任务完成时会退出程序

    func (c *Coordinator) JobDoneResponse(msg *JobDoneMsg, re *JobDoneMsgRE) error {
        if msg.TaskType == 0 {
           c.mapMu.Lock()
           c.mapDone[msg.TaskNum] = true
           c.mapCount++
           if c.mapCount >= c.mapTaskNum && c.reduceCount < c.reduceTaskNum {
              for i, value := range c.reduceDone {
                 if !value {
                    c.reduceCh <- i
                 }
              }
           }
           c.mapMu.Unlock()
        } else {
           c.reduceMu.Lock()
           c.reduceDone[msg.TaskNum] = true
           c.reduceCount++
           c.reduceMu.Unlock()
        }
        return nil
    }
    

注意点

并行计算和网络通信:

coordinator启动rpc服务端,worker连接并发送请求

coordinator:

func (c *Coordinator) server() {
    rpc.Register(c)
    rpc.HandleHTTP()
    //l, e := net.Listen("tcp", ":1234")
    sockname := coordinatorSock()
    os.Remove(sockname)
    l, e := net.Listen("unix", sockname)
    if e != nil {
       log.Fatal("listen error:", e)
    }
    go http.Serve(l, nil)
}

worker:

func call(rpcname string, args interface{}, reply interface{}) bool {
    // c, err := rpc.DialHTTP("tcp", "127.0.0.1"+":1234")
    sockname := coordinatorSock()
    c, err := rpc.DialHTTP("unix", sockname)
    // 连接不上默认任务已经完成
    if err != nil {
       // log.Fatal("dialing:", err)
       os.Exit(0)
    }
    defer c.Close()

    err = c.Call(rpcname, args, reply)
    if err == nil {
       return true
    }
    return false
}

可以启动多个worke与coordinator通信,每个worker都轮询索要任务

for {
    flag, reply := askForATask()
    //没有收到任务或者执行请求失败
    if !flag {
       // fmt.Println("请求任务过程中出现问题")
    } else {
       flag = handleTask(mapf, reducef, reply)
       if !flag {
          // fmt.Println("处理任务过程中遇到问题") // 可能是故障也可能是没有任务可以执行
          time.Sleep(time.Second)
       }
    }
}
故障恢复:
  1. 每过20s搜索一次,把没有完成的任务加入到阻塞队列当中

​ 此处并没有处理并发问题,如果20s任务还有任务没处理,在题目所给的测试范围下几乎不可能是任务还没处理完,只能是出现故障,因此直接加入队列

若数据量较大可加锁控制并发并用一个set记录哪些任务在map当中

func (c *Coordinator) findNeededExTask() {
    for {
       for i, value := range c.mapDone {
          if !value {
             c.mapCh <- i
          }
       }
       // 所有map都已经执行完毕 可以开始执行reduce任务了
       if c.mapCount == c.mapTaskNum && c.reduceCount < c.reduceTaskNum {
          for i, value := range c.reduceDone {
             if !value {
                c.reduceCh <- i
             }
          }
       }
       time.Sleep(20 * time.Second)
    }
}
  1. 文件原子重命名

​ 在创建中间文件和最终文件时先用一个临时名,文件全部写完后再使用io.Rename,这样文件只要存在了就一定是完整的,避免竞争条件,部分写入等问题。比如如果直接命名,两个并行worker接到了同一个任务,worker2在写fileN,这时候worker1刚开始执行,发现已经创建了fileN,这个文件可能是写了一部分的,也可能是完整的,worker1直接删除准备重写,这时候worker2就会出现报错。

这里由于在同一个系统上运行,临时文件直接用pid拼接了。

Map Task原子重命名

for i := 0; i < reply.ReduceNum; i++ {
    // fmt.Println(i)
    oname := "mr-" + strconv.Itoa(reply.TaskNum) + "-" + strconv.Itoa(i) + "-tmp" + strconv.Itoa(os.Getpid())
    if _, err := os.Create(oname); err != nil {
       fmt.Println("创建文件失败")
       return false
    }
}
for _, value := range intermediate {
    // fmt.Println(value)
    i := ihash(value.Key) % reply.ReduceNum
    oname := "mr-" + strconv.Itoa(reply.TaskNum) + "-" + strconv.Itoa(i) + "-tmp" + strconv.Itoa(os.Getpid())
    ofile, err := os.OpenFile(oname, os.O_APPEND|os.O_WRONLY, os.ModeAppend)
    if err != nil {
       fmt.Println("map Task中 写入文件时出现错误")
    }
    // fmt.Println(ofile)
    enc := json.NewEncoder(ofile)
    err2 := enc.Encode(&value)
    if err2 != nil {
       fmt.Println(err2)
       fmt.Println("encoder 失败")
    }
    err3 := ofile.Close()
    if err3 != nil {
       fmt.Println(err3)
    }
}

Reduce Task原子重命名

func reduceIntermediateKv(reply AskTaskReply, intermediate []KeyValue, reducef func(string, []string) string) string {
	oname := "mr-out-" + strconv.Itoa(reply.TaskNum) + strconv.Itoa(os.Getpid())
	ofile, _ := os.Create(oname)
	for i := 0; i < len(intermediate); {
		j := i + 1
		for j < len(intermediate) && intermediate[j].Key == intermediate[i].Key {
			j++
		}
		values := []string{}
		for k := i; k < j; k++ {
			values = append(values, intermediate[k].Value)
		}
		output := reducef(intermediate[i].Key, values)

		// this is the correct format for each line of Reduce output.
		fmt.Fprintf(ofile, "%v %v\n", intermediate[i].Key, output)
		i = j
	}
	return oname
}
func writeIntermediateKVToFinalFile(oname string, reply AskTaskReply) bool {
    err := os.Rename(oname, "mr-out-"+strconv.Itoa(reply.TaskNum))
    if err != nil {
       fmt.Println(err)
       return false
    }
    for i := 0; i < reply.MapNum; i++ {
       oname := "mr-" + strconv.Itoa(i) + "-" + strconv.Itoa(reply.TaskNum)
       err := os.Remove(oname)
       // fmt.Println(oname)
       if err != nil {
          fmt.Println(err)
       }
    }
    return true
}
任务派发:

相关数据结构:

type Coordinator struct {
    // maptask和reducetask的总数量
    mapTaskNum    int
    reduceTaskNum int

    // 已经完成的maptask和reducetask的数量,帮助快速判断reduceTask是不是还需要派发
    // 注意计数值需要用同步锁来保护
    mapCount    int
    reduceCount int
    mapMu       sync.Mutex
    reduceMu    sync.Mutex

    // 生产者消费者模型 派发任务通道
    reduceCh chan int
    mapCh    chan int

    // 标记reduce task 和 map task,防止重复派发
    mapDone    []bool
    reduceDone []bool

    FileName map[int]string
}

// 服务端派发任务 -1代表目前没任务可派发,0:map Task 1:reduce task
func (c *Coordinator) AskForATask(request *AskTaskRequest, reply *AskTaskReply) error {
	var taskNum, taskType int
	select {
	case taskNum = <-c.reduceCh:
		taskType = 1
	case taskNum = <-c.mapCh:
		taskType = 0
		reply.FileName = c.FileName[taskNum]
	default:
		taskType = -1
	}
	reply.TaskType = taskType
	reply.TaskNum = taskNum
	reply.ReduceNum = c.reduceTaskNum
	reply.MapNum = c.mapTaskNum
	return nil
}


// 找到可派发的任务派发下去,过20s重复派发一次,因为有的任务可能crash了,crash后需要重新派发
func (c *Coordinator) findNeededExTask() {
	for {
		for i, value := range c.mapDone {
			if !value {
				c.mapCh <- i
			}
		}
		// 所有map都已经执行完毕 可以开始执行reduce任务了
		if c.mapCount == c.mapTaskNum && c.reduceCount < c.reduceTaskNum {
			for i, value := range c.reduceDone {
				if !value {
					c.reduceCh <- i
				}
			}
		}
		time.Sleep(20 * time.Second)
	}
}

利用go语言的并发编程工具,其中channel等作为阻塞队列实现生产者消费者模型,便利用map记录任务是否已经完成,coordinator退出,此时worker连接不上也退出。

任务处理:
func Worker(mapf func(string, string) []KeyValue,
    reducef func(string, []string) string) {
    // Your worker implementation here.
    for {
       flag, reply := askForATask()
       //没有收到任务或者执行请求失败
       if !flag {
          // fmt.Println("请求任务过程中出现问题")
       } else {
          flag = handleTask(mapf, reducef, reply)
          if !flag {
             // fmt.Println("处理任务过程中遇到问题")
             time.Sleep(time.Second)
          }
       }
    }
}

核心部分,找到一个任务并且执行。

func askForATask() (bool, AskTaskReply) {
    request := AskTaskRequest{1}
    reply := AskTaskReply{}
    ok := call("Coordinator.AskForATask", &request, &reply)
    return ok, reply
}

使用rpc调用去找任务。

func handleTask(mapf func(string, string) []KeyValue,
    reducef func(string, []string) string,
    reply AskTaskReply) bool {
    if reply.TaskType == MapTask {
       done := handleMapTask(mapf, reply)
       if !done {
          return done
       }
    } else if reply.TaskType == ReduceTask { // reduce Task
       done := handleReduceTask(reply, reducef)
       if !done {
          return done
       }
    } else {
       // fmt.Println("暂时请求不到任务")
       return false
    }
    return true
}

根据reduce任务还是map任务分情况执行。

func handleMapTask(mapf func(string, string) []KeyValue, reply AskTaskReply) bool {
    intermediate := dealWithFileContentToKV(mapf, reply.FileName)
    flag := createTempFileForNMapTask(reply)
    if !flag {
       fmt.Println("map Task中创建文件失败")
       return false
    }
    writeIntermediateKVToTempFileThroughHash(intermediate, reply)
    modifyTempToIntermediateFile(reply)
    JobDoneNotify(0, reply.TaskNum, 1)
    return true
}
func handleReduceTask(reply AskTaskReply, reducef func(string, []string) string) bool {
	intermediate := readIntermediateFileToKV(reply.MapNum, reply.TaskNum)
	sort.Sort(ByKey(intermediate))
	oname := reduceIntermediateKv(reply, intermediate, reducef)
	flag := writeIntermediateKVToFinalFile(oname, reply)
	if !flag {
		fmt.Println("reduce Task中写入最终文件失败")
		return flag
	}
	JobDoneNotify(1, reply.TaskNum, 1)
	return true
}

执行reduce任务和Map任务的核心代码。

代码附录:

mr/worker.go

package mr

import (
    "encoding/json"
    "fmt"
    "io/ioutil"
    "os"
    "sort"
    "strconv"
    "time"
)
import "log"
import "net/rpc"
import "hash/fnv"

// Map functions return a slice of KeyValue.
type KeyValue struct {
    Key   string
    Value string
}

type ByKey []KeyValue

// for sorting by key.
func (a ByKey) Len() int           { return len(a) }
func (a ByKey) Swap(i, j int)      { a[i], a[j] = a[j], a[i] }
func (a ByKey) Less(i, j int) bool { return a[i].Key < a[j].Key }

// use ihash(key) % NReduce to choose the reduce
// task number for each KeyValue emitted by Map.
func ihash(key string) int {
    h := fnv.New32a()
    h.Write([]byte(key))
    return int(h.Sum32() & 0x7fffffff)
}

// main/mrworker.go calls this function.
func Worker(mapf func(string, string) []KeyValue,
    reducef func(string, []string) string) {
    // Your worker implementation here.
    for {
       flag, reply := askForATask()
       //没有收到任务或者执行请求失败
       if !flag {
          // fmt.Println("请求任务过程中出现问题")
       } else {
          flag = handleTask(mapf, reducef, reply)
          if !flag {
             // fmt.Println("处理任务过程中遇到问题")
             time.Sleep(time.Second)
          }
       }
    }
}

func askForATask() (bool, AskTaskReply) {
    request := AskTaskRequest{1}
    reply := AskTaskReply{}
    ok := call("Coordinator.AskForATask", &request, &reply)
    return ok, reply
}

/*
先创建成临时文件,当文件已经全部写好的时候再用原子操作改为中间文件,防止程序在写文件中的中途crash
*/
func handleTask(mapf func(string, string) []KeyValue,
    reducef func(string, []string) string,
    reply AskTaskReply) bool {
    if reply.TaskType == MapTask {
       done := handleMapTask(mapf, reply)
       if !done {
          return done
       }
    } else if reply.TaskType == ReduceTask { // reduce Task
       done := handleReduceTask(reply, reducef)
       if !done {
          return done
       }
    } else {
       // fmt.Println("暂时请求不到任务")
       return false
    }
    return true
}

func handleReduceTask(reply AskTaskReply, reducef func(string, []string) string) bool {
    intermediate := readIntermediateFileToKV(reply.MapNum, reply.TaskNum)
    sort.Sort(ByKey(intermediate))
    oname := reduceIntermediateKv(reply, intermediate, reducef)
    flag := writeIntermediateKVToFinalFile(oname, reply)
    if !flag {
       fmt.Println("reduce Task中写入最终文件失败")
       return flag
    }
    JobDoneNotify(1, reply.TaskNum, 1)
    return true
}

func handleMapTask(mapf func(string, string) []KeyValue, reply AskTaskReply) bool {
    intermediate := dealWithFileContentToKV(mapf, reply.FileName)
    flag := createTempFileForNMapTask(reply)
    if !flag {
       fmt.Println("map Task中创建文件失败")
       return false
    }
    writeIntermediateKVToTempFileThroughHash(intermediate, reply)
    modifyTempToIntermediateFile(reply)
    JobDoneNotify(0, reply.TaskNum, 1)
    return true
}

func writeIntermediateKVToFinalFile(oname string, reply AskTaskReply) bool {
    err := os.Rename(oname, "mr-out-"+strconv.Itoa(reply.TaskNum))
    if err != nil {
       fmt.Println(err)
       return false
    }
    for i := 0; i < reply.MapNum; i++ {
       oname := "mr-" + strconv.Itoa(i) + "-" + strconv.Itoa(reply.TaskNum)
       err := os.Remove(oname)
       // fmt.Println(oname)
       if err != nil {
          fmt.Println(err)
       }
    }
    return true
}

func reduceIntermediateKv(reply AskTaskReply, intermediate []KeyValue, reducef func(string, []string) string) string {
    oname := "mr-out-" + strconv.Itoa(reply.TaskNum) + strconv.Itoa(os.Getpid())
    ofile, _ := os.Create(oname)
    for i := 0; i < len(intermediate); {
       j := i + 1
       for j < len(intermediate) && intermediate[j].Key == intermediate[i].Key {
          j++
       }
       values := []string{}
       for k := i; k < j; k++ {
          values = append(values, intermediate[k].Value)
       }
       output := reducef(intermediate[i].Key, values)

       // this is the correct format for each line of Reduce output.
       fmt.Fprintf(ofile, "%v %v\n", intermediate[i].Key, output)
       i = j
    }
    return oname
}

func readIntermediateFileToKV(MapNum, TaskNum int) []KeyValue {
    intermediate := []KeyValue{}
    for i := 0; i < MapNum; i++ {
       oname := "mr-" + strconv.Itoa(i) + "-" + strconv.Itoa(TaskNum)
       ofile, err := os.Open(oname)
       if err != nil {
          fmt.Println("cannot Open reduce file")
       }
       dec := json.NewDecoder(ofile)
       for {
          var kv KeyValue
          if err := dec.Decode(&kv); err != nil {
             break
          }
          intermediate = append(intermediate, kv)
       }
       err2 := ofile.Close()
       if err2 != nil {
          fmt.Println(err2)
       }
    }
    return intermediate
}

func modifyTempToIntermediateFile(reply AskTaskReply) {
    for i := 0; i < reply.ReduceNum; i++ {
       oname := "mr-" + strconv.Itoa(reply.TaskNum) + "-" + strconv.Itoa(i)
       if err := os.Rename(oname+"-tmp"+strconv.Itoa(os.Getpid()), oname); err != nil {
          fmt.Println("临时文件重命名失败" + oname)
       }
    }
}

func writeIntermediateKVToTempFileThroughHash(intermediate []KeyValue, reply AskTaskReply) {
    // 将k-v根据hash函数写入到reduceNum个文件当中
    for _, value := range intermediate {
       // fmt.Println(value)
       i := ihash(value.Key) % reply.ReduceNum
       oname := "mr-" + strconv.Itoa(reply.TaskNum) + "-" + strconv.Itoa(i) + "-tmp" + strconv.Itoa(os.Getpid())
       ofile, err := os.OpenFile(oname, os.O_APPEND|os.O_WRONLY, os.ModeAppend)
       if err != nil {
          fmt.Println("map Task中 写入文件时出现错误")
       }
       // fmt.Println(ofile)
       enc := json.NewEncoder(ofile)
       err2 := enc.Encode(&value)
       if err2 != nil {
          fmt.Println(err2)
          fmt.Println("encoder 失败")
       }
       err3 := ofile.Close()
       if err3 != nil {
          fmt.Println(err3)
       }
    }
}

func createTempFileForNMapTask(reply AskTaskReply) bool {
    for i := 0; i < reply.ReduceNum; i++ {
       // fmt.Println(i)
       oname := "mr-" + strconv.Itoa(reply.TaskNum) + "-" + strconv.Itoa(i) + "-tmp" + strconv.Itoa(os.Getpid())
       if _, err := os.Create(oname); err != nil {
          fmt.Println("创建文件失败")
          return false
       }
    }
    return true
}

func dealWithFileContentToKV(mapf func(string, string) []KeyValue, filename string) []KeyValue {
    intermediate := []KeyValue{}
    file, err := os.Open(filename)
    if err != nil {
       log.Fatalf("worker 64 : cannot open %v", filename)
    }
    content, err := ioutil.ReadAll(file)
    if err != nil {
       log.Fatalf("worker 68 : cannot read %v", filename)
    }
    file.Close()
    kva := mapf(filename, string(content))
    intermediate = append(intermediate, kva...)
    return intermediate
}

func JobDoneNotify(TaskType, TaskNum, status int) {
    msg := JobDoneMsg{TaskType, TaskNum, status}
    re := JobDoneMsgRE{}
    // todo 根据re处理一些信息
    call("Coordinator.JobDoneResponse", &msg, &re)
}

// example function to show how to make an RPC call to the coordinator.
//
// the RPC argument and reply types are defined in rpc.go.
func CallExample() {

    // declare an argument structure.
    args := ExampleArgs{}

    // fill in the argument(s).
    args.X = 99

    // declare a reply structure.
    reply := ExampleReply{}

    // send the RPC request, wait for the reply.
    // the "Coordinator.Example" tells the
    // receiving server that we'd like to call
    // the Example() method of struct Coordinator.
    ok := call("Coordinator.Example", &args, &reply)
    if ok {
       // reply.Y should be 100.
       fmt.Printf("reply.Y %v\n", reply.Y)
    } else {
       fmt.Printf("call failed!\n")
    }
}

// send an RPC request to the coordinator, wait for the response.
// usually returns true.
// returns false if something goes wrong.
func call(rpcname string, args interface{}, reply interface{}) bool {
    // c, err := rpc.DialHTTP("tcp", "127.0.0.1"+":1234")
    sockname := coordinatorSock()
    c, err := rpc.DialHTTP("unix", sockname)
    // 连接不上默认任务已经完成
    if err != nil {
       // log.Fatal("dialing:", err)
       os.Exit(0)
    }
    defer c.Close()

    err = c.Call(rpcname, args, reply)
    if err == nil {
       return true
    }
    return false
}

coordinator.go

package mr

import (
    "log"
    "sync"
    "time"
)
import "net"
import "os"
import "net/rpc"
import "net/http"

type Coordinator struct {
    // maptask和reducetask的总数量
    mapTaskNum    int
    reduceTaskNum int

    // 已经完成的maptask和reducetask的数量,帮助快速判断reduceTask是不是还需要派发
    // 注意计数值需要用同步锁来保护
    mapCount    int
    reduceCount int
    mapMu       sync.Mutex
    reduceMu    sync.Mutex

    // 生产者消费者模型 派发任务通道
    reduceCh chan int
    mapCh    chan int

    // 标记reduce task 和 map task,防止重复派发
    mapDone    []bool
    reduceDone []bool

    FileName map[int]string
}

func NewCoordinator(mapNum, reduceNum int) *Coordinator {
    return &Coordinator{
       mapTaskNum:    mapNum,
       reduceTaskNum: reduceNum,
       mapDone:       make([]bool, mapNum),
       reduceDone:    make([]bool, reduceNum),
       reduceCount:   0,
       reduceCh:      make(chan int, reduceNum),
       mapCh:         make(chan int, mapNum),
       FileName:      make(map[int]string),
    }
}

// Your code here -- RPC handlers for the worker to call.

// 服务端派发任务 -1代表目前没任务可派发,0:map Task 1:reduce task
func (c *Coordinator) AskForATask(request *AskTaskRequest, reply *AskTaskReply) error {
    var taskNum, taskType int
    select {
    case taskNum = <-c.reduceCh:
       taskType = 1
    case taskNum = <-c.mapCh:
       taskType = 0
       reply.FileName = c.FileName[taskNum]
    default:
       taskType = -1
    }
    reply.TaskType = taskType
    reply.TaskNum = taskNum
    reply.ReduceNum = c.reduceTaskNum
    reply.MapNum = c.mapTaskNum
    return nil
}

// 服务端任务标记  客户端任务完成后,会发送消息到服务端,服务端进行标记防止重复派发
func (c *Coordinator) JobDoneResponse(msg *JobDoneMsg, re *JobDoneMsgRE) error {
    if msg.TaskType == 0 {
       c.mapMu.Lock()
       c.mapDone[msg.TaskNum] = true
       c.mapCount++
       if c.mapCount >= c.mapTaskNum && c.reduceCount < c.reduceTaskNum {
          for i, value := range c.reduceDone {
             if !value {
                c.reduceCh <- i
             }
          }
       }
       c.mapMu.Unlock()
    } else {
       c.reduceMu.Lock()
       c.reduceDone[msg.TaskNum] = true
       c.reduceCount++
       c.reduceMu.Unlock()
    }
    return nil
}

// 找到可派发的任务派发下去,过20s重复派发一次,因为有的任务可能crash了,crash后需要重新派发
func (c *Coordinator) findNeededExTask() {
    for {
       for i, value := range c.mapDone {
          if !value {
             c.mapCh <- i
          }
       }
       // 所有map都已经执行完毕 可以开始执行reduce任务了
       if c.mapCount == c.mapTaskNum && c.reduceCount < c.reduceTaskNum {
          for i, value := range c.reduceDone {
             if !value {
                c.reduceCh <- i
             }
          }
       }
       time.Sleep(20 * time.Second)
    }
}

// an example RPC handler.
//
// the RPC argument and reply types are defined in rpc.go.
func (c *Coordinator) Example(args *ExampleArgs, reply *ExampleReply) error {
    reply.Y = args.X + 1
    return nil
}

// start a thread that listens for RPCs from worker.go
func (c *Coordinator) server() {
    rpc.Register(c)
    rpc.HandleHTTP()
    //l, e := net.Listen("tcp", ":1234")
    sockname := coordinatorSock()
    os.Remove(sockname)
    l, e := net.Listen("unix", sockname)
    if e != nil {
       log.Fatal("listen error:", e)
    }
    go http.Serve(l, nil)
}

// main/mrcoordinator.go calls Done() periodically to find out
// if the entire job has finished.
func (c *Coordinator) Done() bool {
    // Your code here.
    return c.reduceCount == c.reduceTaskNum
}

// create a Coordinator.
// main/mrcoordinator.go calls this function.
// nReduce is the number of reduce tasks to use.
func MakeCoordinator(files []string, nReduce int) *Coordinator {
    c := NewCoordinator(len(files), nReduce)
    for i, filename := range files[0:] {
       c.FileName[i] = filename
    }
    // Your code here.
    go c.findNeededExTask()
    c.server()
    return c
}

rpc.go

package mr

//
// RPC definitions.
//
// remember to capitalize all names.
//

import "os"
import "strconv"

//
// example to show how to declare the arguments
// and reply for an RPC.
//

type ExampleArgs struct {
    X int
}

type ExampleReply struct {
    Y int
}

type AskTaskRequest struct {
    Status int // machine Status
}

type AskTaskReply struct {
    TaskType  int // -1: no Task 0: map task 1: reduce task
    TaskNum   int //
    FileName  string
    ReduceNum int
    MapNum    int
}

type JobDoneMsg struct {
    TaskType int
    TaskNum  int
    Status   int
}

type JobDoneMsgRE struct {
    Status int
}

const (
    MapTask    = 0
    ReduceTask = 1
)

// Add your RPC definitions here.

// Cook up a unique-ish UNIX-domain socket name
// in /var/tmp, for the coordinator.
// Can't use the current directory since
// Athena AFS doesn't support UNIX-domain sockets.
func coordinatorSock() string {
    s := "/var/tmp/5840-mr-"
    s += strconv.Itoa(os.Getuid())
    return s
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值