MIT 6.824 Lab1

Statistic

time spent: 12 h
lines added: 397
github: https://github.com/ztzhu1/MIT6.824

Data Structure

type Coordinator struct {
	nReduce          int
	tasks_unassigned chan *Task
	map_tasks        [] *Task
	reduce_tasks     [] *Task
	mu               sync.Mutex
	mapDone          bool
	cleanDone        bool
}

type Task struct {
	Type       TaskType
	ID         int
	InputName  string
	OutputName string
	processing bool
	procStart  time.Time // time when the worker start processing the task
}

const (
	MAP    TaskType = 0
	REDUCE TaskType = 1
	REREQ  TaskType = 2 // re-request
	QUIT   TaskType = 3
)

Naming style

map task:

officialtemp
mr-X-Ymr-X-Y-ID

X denotes the Xth map task and Y for the Yth reduce task. ID is a random value generated by os.CreateTemp.

reduce task:

officialtemp
mr-out-Ymr-out-Y-ID

Same as above.

Idea

  • Storing the tasks in the channel, which prevents race condition naturally. But sometimes we still need the mutex lock. Because the channel is essentially a thread-safe queue. Sometimes it’s useful and convenient, but this is not always true.
  • Use another two slices to maintain the status of tasks. If a task is done, it’s removed from the slice. If the two slices are both empty, reprensenting all the tasks are finished.
  • When one worker finished its task, it notifies the coodinator, so the latter can rename the temp files, do the cleaning work and maintain tasks’ status.
  • There is a loop in mrcoordinator.go, which won’t stop until the coordinator says “Done!”. In every circle, it invokes a method called Tick(). Tick() will check the processing time according to real world time for every task. If a task spends too much time, there may be something wrong with the corresponding worker. So the coordinator will push this task into task_unassigned again. So that the other workers can adopt this task.
  • Reduce tasks should not be assigned to workers until all the map tasks are done. So I set the filed mapDone.
  • When all the tasks are done and the coordinator wants to quit, it cleans all the temp files, if not cleanDone.

Bugs I met

  • RPC needs the first letter of the fields in args and reply to be upper case, or the value of variables may be wrong.
  • The naming style of map task is a little confusing. When the worker tries to complete the reduce task, it collects the file with the name mr-*-Y. Seems fine, right? Actually, the temp map task name mr-X-Y-ID may also matches this pattern, when Y == ID. It’s subtle.
  • At the very begin, I misunderstood the meaning of map. I write all the map result into a single file. But the spirit is help every reduce task be able to process different words. So we need to assign the map result to different files, which belong to one reduce task, according to the hash value of the word.
  • All the workers should quit together. If the coordinator is still alive and hasn’t told the worker to quit, the worker should ask for more tasks every few seconds. But I made the worker quit directly when it finished its work and doesn’t receive a new task for a while.

TODO

  • For simplicity, I assigned one map task for every file (although this task can produce many files, according to the reduce number). If the file is large, we should schedule more map tasks for it, or split it into smaller files in advance.
  • The input name matching pattern is not very elegant.
  • Some locks are unnecessary.
  • Still haven’t made full use of channel mechanism.

Test Result

To recognize test result easily, I added color for PASS and FAIL. Any other code of the test script remains unchanged.lab1_test

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值