Map
- 从磁盘上读取数据
- 执行map函数
- Partition分区(放进内存)
- Sort排序(内存排序)
- Combine结果(内存预聚合)
- 将结果写到本地的磁盘上
- Merge(对磁盘上的文件合并)
Reduce
- Copy (fetch 拉取数据直接放进内存)
- Merge (内存->磁盘)
- Merge (磁盘->磁盘)
- 执行reduce函数
word count 例子
假如有一个文件,被切分成两个split (也就是有两个map task)
split 0:
My name is Tony
My company is Pivotal
split 1:
My name is Lisa
My company is EMC
执行map函数
split 0:
My 1
name 1
is 1
Tony 1
My 1
company 1
is 1
Pivotal 1
split 1:
My 1
name 1
is 1