mit6.824 Lab 1

最新推荐文章于 2024-07-29 21:33:23 发布

tailuzhecom

最新推荐文章于 2024-07-29 21:33:23 发布

阅读量441

点赞数

本文链接：https://blog.csdn.net/tailuzhecom/article/details/93376646

版权

Part I: Map/Reduce input and output

lab code流程：

开启RPC Server
等待workers来注册
在schedule()中决定如何将tasks分配给workers,如何处理worker发生的错误
对map task调用doMap(),doMap()的动作为读取对应的文件，并对其中的内容调用map function，最后将生成的key/value pairs写到对应的中间文件中。
master调用doReduce()，最后产生nReduce个结果文件
master调用mr.merge()来合并上一步产生的nReduce个文件
master向worker发送Shutdown信号，并关闭RPC Server

MapReduce paper

section 3.1描述MapReduce的具体细节

map tasks的数量为M,reduce tasks的数量为R，master选择空闲的worker来执行map task或者reduce task
中间文件中的key如何被划分到R个分片
使用函数hash(key) mod R进行划分
在map function输出对应的结果后，结果会存储在机器的内存中。整个系统会周期性地将存储在内存中的结果分为R块写入磁盘，并将存储的地址发给master。
当一个reduce worker从master接收到中间数据的地址时，reduce worker用RPC从map worker的本地磁盘中读取数据。当reduce worker读取完数据后会根据这些数据的key进行排序
reduce worker对排序好的中间数据进行遍历，将不同key和对应的value集合创给用户定义的reduce function。reduce function的结果被追加写到这个reduce分片的输出文件中。

section 3.2 Master的数据结构

master需要维护每个map task和reduce task的状态(idle,in-progress,completed)，以及worker machine的身份。
另外,master还需要记录map tasks所输出的R个中间文件的所在的地址和大小

具体实现

common_map.go

获取inFile中的内容file_content_str
将file_content_str传入mapF得到结果mapF_output(type为[]KeyValue)
对key使用ihash函数分成多个组
对不同组生成map文件，文件名为mrtmp.test-map_task-hash_res，用JSON的形式写入相应的KeyValue

file_content_byte, err := ioutil.ReadFile(inFile)
	if err != nil {
		log.Println(err)
		return
	} else {
		file_content_str := string(file_content_byte)	// 获取content
		mapF_output := mapF(inFile, file_content_str)	// 使用Map函数
		partition_map := make(map[int][]KeyValue)
		for _, item := range mapF_output {
			hash_val := ihash(item.Key) % nReduce	// 计算key的hash value
			partition_map[hash_val] = append(partition_map[hash_val], item)
		}
		// 将map持久化到文件中
		for file_idx, json_str := range partition_map {
			file, e := os.Create(reduceName(jobName, mapTask, file_idx))	// 构造临时文件
			if e != nil {
				fmt.Println(e)
			}

			encoder := json.NewEncoder(file)
			encode := encoder.Encode(json_str)
			if encode != nil {
				fmt.Println(encode)
			}
			file.Close()
		}
	}

map阶段所生成文件的部分内容(mrtmp.test-0-0)为：

[{"Key":"0","Value":""},{"Key":"7","Value":""},{"Key":"10","Value":""},{"Key":"13","Value":""},{"Key":"15","Value":""},{"Key":"23","Value":""},{"Key":"25","Value":""},{"Key":"26","Value":""},{"Key":"29","Value":""},{"Key":"33","Value":""},

common_reduce.go

遍历map阶段生成的文件，找出reduceTask对应的文件。例如，map_num = 3, reduce_num = 3，此时reduceTask为1，那么此时要找出的文件名为mrtmp.test-0-1,mrtmp.test-1-1,mrtmp.test-2-1
读取这些文件中的内容，将JSON解析出来，还原reduceTask对应的KeyValue，在这里需要注意的是一个reduceTask可能对应多个Key
使用map将相同key的value放在数组里
对每一个key调用reduceF，传入key和value数组，返回的结果为reduce_res
将reduce_res以JSON的形式写入reduce阶段的生成文件中，文件名为mrtmp.test-res-n

代码

	reduce_map := make(map[string][]string)
	// sort，每个key，对应一个[]string，然后将每个key和对应的value传递给reduce处理
	for i := 0; i < nMap; i++ {
		tmp_file_name := reduceName(jobName, i, reduceTask)	// 收集每一个maptask产生的对应reduceTask的结果
		tmp_file, e := os.Open(tmp_file_name)
		if e != nil {
			fmt.Println(e)
		}

		decoder := json.NewDecoder(tmp_file)
		var kvs []KeyValue
		decode := decoder.Decode(&kvs)
		if decode != nil {
			fmt.Println("get json error")
		}
		tmp_file.Close()
		for _, kv := range kvs {
			reduce_map[kv.Key] = append(reduce_map[kv.Key], kv.Value)
		}
	}

	// 一个reduceTask可能要处理多个key,对reduce task中的每一个key进行处理
	reduce_file, e := os.Create(mergeName(jobName, reduceTask))  // mrtmp.test-res-n
	defer reduce_file.Close()
	for k, v := range reduce_map {
		reduce_res := reduceF(k, v)

		if e != nil {
			fmt.Println(e)
		}
		encoder := json.NewEncoder(reduce_file)
		kv := KeyValue{k, reduce_res}

		encode := encoder.Encode(kv)
		if encode != nil {
			fmt.Println(encode)
		}
		log.Println("do_reduce()", reduce_res)
	}

mrtmp.test-res-0中的部分内容

{"Key":"36738","Value":""}
{"Key":"71426","Value":""}
{"Key":"76408","Value":""}
{"Key":"80710","Value":""}
{"Key":"86095","Value":""}
{"Key":"64817","Value":""}
{"Key":"72951","Value":""}
{"Key":"97907","Value":""}
{"Key":"91706","Value":""}
{"Key":"97475","Value":""}

生成的最终文件为mrtmp.test，其部分内容为

Part II: Single-worker word count

在mapF()和reduceF()实现wordcout，输入单词，输出每个单词出现的次数

wc.go

*mapF()
以空格分词，对每个词都map为{‘word’, ‘1’}

func mapF(filename string, contents string) []mapreduce.KeyValue {
	// Your code here (Part II).
	var res []mapreduce.KeyValue
	words := strings.Fields(contents)	//返回由空格分割的字符串
	for _, w := range words {
		kv := mapreduce.KeyValue{w, "1"}
		res = append(res, kv)
	}
	return res
}

reduceF()
将values转换为int然后相加，reduce的结果为{‘key’, ‘wordcount’}

func reduceF(key string, values []string) string {
	// Your code here (Part II).
	res := 0
	for _, e := range values {
		val, err := strconv.Atoi(e)
		if err != nil {
			log.Println(err)
		}
		res += val
	}
	return strconv.Itoa(res)
}

Part III: Distributing MapReduce tasks

这部分主要做的是补充schedule.go中的schedule()，在一次mapreduce的过程中一共会调用两次schedule()，一次是在map阶段，一次是在reduce阶段。schedule()的作用是将任务分配给可用的worker，schedule()应该等到所有task都完成了再退出。
schedule()通过registerChan来获取worker的地址，通过发送Worker.DoTask RPC来通知worker去工作，这个RPC的参数在DoTaskArgs中定义。
使用common_rpc.go中的call()去发送RPC给worker，第一个参数为worker的地址（从registerChan中获得），第二个参数为"Worker.DoTask"，第三个参数为DoTaskArgs结构体，最后一个参数为nil。

实现的时候需要注意的是使用RPC时调用call要等到要调用的函数执行完了以后才会返回，所以要使用gorountine来调用call。在调用时不一定会成功，所以要不断地执行，直到成功为止。当task执行完毕后，worker重新变得可用，因此要将worker的地址加入到registerChan中。要注意使用gorountine时函数中的变量需要经过参数的传递才可以使用。等待任务完成使用WaitGroup实现。

func schedule(jobName string, mapFiles []string, nReduce int, phase jobPhase, registerChan chan string) {
	var ntasks int
	var n_other int // number of inputs (for reduce) or outputs (for map)
	switch phase {
	case mapPhase:
		ntasks = len(mapFiles)
		n_other = nReduce

	case reducePhase:
		ntasks = nReduce
		n_other = len(mapFiles)
	}

	fmt.Printf("Schedule: %v %v tasks (%d I/Os)\n", ntasks, phase, n_other)

	// All ntasks tasks have to be scheduled on workers. Once all tasks
	// have completed successfully, schedule() should return.
	//
	// Your code here (Part III, Part IV).
	//
	var wg sync.WaitGroup
	wg.Add(ntasks)
	for i := 0; i < ntasks; i++ {
		worker_addr := <- registerChan
		func_arg := DoTaskArgs{jobName, mapFiles[i], phase, i, n_other}

		go func(worker_addr string, func_arg DoTaskArgs) {
			defer wg.Done()
			for {
				ret := call(worker_addr, "Worker.DoTask", func_arg, nil)
				if ret {  // 成功返回，否则一直运行到成功为止
					go func(registerChan chan string, worker_addr string) {
						registerChan <- worker_addr // worker变为可用状态
					}(registerChan, worker_addr)
					break
				}
			}

		}(worker_addr, func_arg)
	}
	wg.Wait()
	fmt.Printf("Schedule: %v done\n", phase)
}

Part IV: Handling worker failures

在这部分要做的工作是让master处理那些失败的workers，对于那些失败的workers call()会因为超时返回false，这时master应该将那些task重新分配给可用的worker。
worker的失败不意味着它没有执行task，它有可能已经完成task但是因为RPC reply没有到达master而导致失败，也有可能因为task的执行时间过长而导致RPC call timeout失败。

go总结

逐行扫描，scanner.Text()为该行的字符串

file, err := os.Open("temp.txt")
	if err != nil {
		fmt.Println(err)
	}

	defer file.Close()
	scanner := bufio.NewScanner(file)
	i := 0
	for scanner.Scan() {
		fmt.Print(scanner.Text())
		fmt.Println(i)
		i++
	}

WaitGroup用法

func go_test() {
	time.Sleep(time.Second * 2)
}

func main() {
	var wg sync.WaitGroup
	for i := 0; i < 5; i++ {
		wg.Add(1)
		go func() {
			go_test()
			wg.Done()
		}()
	}
	wg.Wait()
}