Statistic
time spent: 12 h
lines added: 397
github: https://github.com/ztzhu1/MIT6.824
Data Structure
type Coordinator struct {
nReduce int
tasks_unassigned chan *Task
map_tasks [] *Task
reduce_tasks [] *Task
mu sync.Mutex
mapDone bool
cleanDone bool
}
type Task struct {
Type TaskType
ID int
InputName string
OutputName string
processing bool
procStart time.Time // time when the worker start processing the task
}
const (
MAP TaskType = 0
REDUCE TaskType = 1
REREQ TaskType = 2 // re-request
QUIT TaskType = 3
)
Naming style
map task:
official | temp |
---|---|
mr-X-Y | mr-X-Y-ID |
X
denotes the Xth map task and Y
for the Yth reduce task. ID
is a random value generated by os.CreateTemp
.
reduce task:
official | temp |
---|---|
mr-out-Y | mr-out-Y-ID |
Same as above.
Idea
- Storing the tasks in the channel, which prevents race condition naturally. But sometimes we still need the mutex lock. Because the channel is essentially a thread-safe queue. Sometimes it’s useful and convenient, but this is not always true.
- Use another two slices to maintain the status of tasks. If a task is done, it’s removed from the slice. If the two slices are both empty, reprensenting all the tasks are finished.
- When one worker finished its task, it notifies the coodinator, so the latter can rename the temp files, do the cleaning work and maintain tasks’ status.
- There is a loop in
mrcoordinator.go
, which won’t stop until the coordinator says “Done!”. In every circle, it invokes a method calledTick()
.Tick()
will check the processing time according to real world time for every task. If a task spends too much time, there may be something wrong with the corresponding worker. So the coordinator will push this task intotask_unassigned
again. So that the other workers can adopt this task. - Reduce tasks should not be assigned to workers until all the map tasks are done. So I set the filed
mapDone
. - When all the tasks are done and the coordinator wants to quit, it cleans all the temp files, if
not cleanDone
.
Bugs I met
- RPC needs the first letter of the fields in
args
andreply
to be upper case, or the value of variables may be wrong. - The naming style of map task is a little confusing. When the worker tries to complete the reduce task, it collects the file with the name mr-*-Y. Seems fine, right? Actually, the temp map task name mr-X-Y-ID may also matches this pattern, when Y == ID. It’s subtle.
- At the very begin, I misunderstood the meaning of
map
. I write all the map result into a single file. But the spirit is help every reduce task be able to process different words. So we need to assign the map result to different files, which belong to one reduce task, according to the hash value of the word. - All the workers should quit together. If the coordinator is still alive and hasn’t told the worker to quit, the worker should ask for more tasks every few seconds. But I made the worker quit directly when it finished its work and doesn’t receive a new task for a while.
TODO
- For simplicity, I assigned one map task for every file (although this task can produce many files, according to the reduce number). If the file is large, we should schedule more map tasks for it, or split it into smaller files in advance.
- The input name matching pattern is not very elegant.
- Some locks are unnecessary.
- Still haven’t made full use of channel mechanism.
Test Result
To recognize test result easily, I added color for PASS and FAIL. Any other code of the test script remains unchanged.