MIT 6.824-Lab 1 MapReduce

最新推荐文章于 2024-07-03 16:01:13 发布

LLLSoul

最新推荐文章于 2024-07-03 16:01:13 发布

阅读量3.2k

点赞数 1

分类专栏： linux 分布式文章标签： mapreduce 分布式

本文链接：https://blog.csdn.net/qq_42553836/article/details/122745641

版权

1 前言

实验环境配置：https://pdos.csail.mit.edu/6.824/labs/lab-mr.html。

这篇笔记主要起到自我学习过程记录的作用，只写了我完成这个实验的大致步骤和遇到问题的解决思路，没有写我的实现思路（其实我的实现思路也很简单，不像很多大佬都做了优化），所以希望看到这篇文章的大佬不要太过严格，但是如果能对其他人提供帮助的话那就更好了。

2 阅读代码

看一下map函数：

// The map function is called once for each file of input. The first
// argument is the name of the input file, and the second is the
// file's complete contents. You should ignore the input file name,
// and look only at the contents argument. The return value is a slice
// of key/value pairs.
//
func Map(filename string, contents string) []mr.KeyValue {
   
	// function to detect word separators.
	ff := func(r rune) bool {
    return !unicode.IsLetter(r) }

	// split contents into an array of words.
	words := strings.FieldsFunc(contents, ff)

	kva := []mr.KeyValue{
   }
	for _, w := range words {
   
		kv := mr.KeyValue{
   w, "1"}
		kva = append(kva, kv)
	}
	return kva
}

就是靠strings.FieldsFunc，按空格和换行分割一个文件中的所有内容（数字不会保留到切片），最后形成一个巨大的slice words。每个单词就是一个元素。

kva就是kv array。是kv数组，对切片words遍历，每一个word就会生成一个{word, “1”}键值对，然后传到kva里面，最后返回kva。

kva就类似于：[{sheng 1} {jun 1} {a 1} ...]

然后mrsequential.go就会将kva追加到intermediate内。

所以，kva是一个文件的单词切片，intermediate是已完成map任务的所有文件的单词切片

这个和实际MapReduce有区别，因为实际的中间内容不可能全放在一起，而是会做分区放在buckets，以节省内存。

然后会对intermediate里面的元素按字母顺序排序。排完序之后就可以根据前后的key是不是一样进行计数：

// mrsequential.go
i := 0
for i < len(intermediate) {
   
    j := i + 1
    for j < len(intermediate) && intermediate[j].Key == intermediate[i].Key {
   
        j++
    }
    values := []string{
   }
    for k := i; k < j; k++ {
   
        values = append(values, intermediate[k].Value)
    }
    output := reducef(intermediate