[MIT 6.824 Distributed System] Lab 1: MapReduce (2016)

这篇博客详细介绍了MIT分布式系统课程的Lab 1,涉及MapReduce的实现,包括Map/Reduce的输入输出、单个worker的单词计数、分布式任务分配及处理worker故障的方法。博主分享了自己的实现过程,并讨论了如何处理中间文件、使用哈希进行数据分布以及在遇到worker失败时的策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MIT分布式系统课程实验1:Lab 1: MapReduce

以下是我自己实现的版本,与大家分享一下,有问题欢迎提出,也希望各位指出错误!

common.go里面可以打开调试:

// Debugging enabled?
const debugEnabled = true

Overview

这里写图片描述

Part I: Map/Reduce input and output

第一部分主要是实现文件读写,读写内容当然就是key/value了。

假设,

  • M:Map的数目,也就是将数据集split成M份,分配给M个 Mappers 处理。如上图(3)read。
  • R: Reduce的数目,也即有R个Reducers,最后有R个输出文件。每个 Mapper 会将读入的key/value 数据写到R份中间文件中,也就是分配给R个Reducers。

  1. common_map.go
    我用了一个数组存放输出文件的指针,一次过读入输入文件,用mapF函数生成key/value。遍历key/value,将每个key/value哈希到不同的输出文件中。
// doMap does the job of a map worker: it reads one of the input files
// (inFile), calls the user-defined map function (mapF) for that file's
// contents, and partitions the output into nReduce intermediate files.
func doMap(
    jobName string, // the name of the MapReduce job
    mapTaskNumber int, // which map task this is
    inFile string,
    nReduce int, // the number of reduce task that will be run ("R" in the paper)
    mapF func(file string, contents string) []KeyValue,
) {
    // TODO:
    // You will need to write this function.
    // You can find the filename for this map task's input to reduce task number
    // r using reduceName(jobName, mapTaskNumber, r). The ihash function (given
    // below doMap) should be used to decide which file a given key belongs into.
    //
    // The intermediate output of a map task is stored in the file
    // system as multiple files whose name indicates which map task produced
    // them, as well as which reduce task they are for. Coming up with a
    // scheme for how to store the key/value pairs on disk can be tricky,
    // especially when taking into account that both keys and values could
    // contain newlines, quotes, and any other character you can think of.
    //
    // One format often used for serializing data to a byte stream that the
    // other end can correctly reconstruct is JSON. You are not required to
    // use JSON, but as the output of the reduce tasks *must* be JSON,
    // familiarizing yourself with it here may prove useful. You can write
    // out a data structure as a JSON string to a file using the commented
    
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值