一、验证MapReduce任务是多进程的
1. 实现MyMapper代码,Reducer可相同处理。如下:
package com.mapreduce;
import java.io.IOException;
import java.lang.management.ManagementFactory;
import java.lang.management.RuntimeMXBean;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
//全局计数
private static int map_index = 0;
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
System.out.println("map_index: " + (++map_index));
//获取进程信息
RuntimeMXBean runtime = ManagementFactory.getRuntimeMXBean();
String name = runtime.getName();
System.out.println("Current Process: " + name + "--" + name.substring(0, name.indexOf("@")));
//获取线程信息
System.out.println("Current Thread: " + Thread.currentThread().getId() + "-" + Thread.currentThread().getName());
//获取当前类MyMapper信息
System.out.println("Current Mapper: " + this.toString());
context.write(new Text(""), new Text(""));
}
}
2. map输入文件夹下存放两个文件,以开启两个map任务。每个文件中有3行数据,两个文件共6行。
3. 假设:如果map任务是单进程的,那么开启的2个map任务为多线程,此时两个线程可以共享该进程的内存资源,运行输出将是进程名相同,线程不同,Mapper对象不同,全局计数map_index应为1,2,3,4,5,6。但事实上结果并非如此。如下:
第一个Map任务的输出日志:
map_index: 1
Current Process: 6717@slave1--6717
Current Thread: 1-main
Current Mapper: com.etl.mapreduce.ClickStreamMapper@5f3b9c57
map_index: 2
Current Process: 6717@slave1--6717
Current Thread: 1-main
Current Mapper: com.etl.mapreduce.ClickStreamMapper@5f3b9c57
map_index: 3
Current Process: 6717@slave1--6717
Current Thread: 1-main
Current Mapper: com.etl.mapreduce.ClickStreamMapper@5f3b9c57
第二个Map任务的输出日志:
map_index: 1
Current Process: 6728@slave1--6728
Current Thread: 1-main
Current Mapper: com.etl.mapreduce.ClickStreamMapper@1e044120
map_index: 2
Current Process: 6728@slave1--6728
Current Thread: 1-main
Current Mapper: com.etl.mapreduce.ClickStreamMapper@1e044120
map_index: 3
Current Process: 6728@slave1--6728
Current Thread: 1-main
Current Mapper: com.etl.mapreduce.ClickStreamMapper@1e044120
4.事实:可以看出,第一个Map任务所属进程IP为6717,第二个Map任务所属进程ID为6728,显然进程不同;两个任务的线程都为main主线程,也就是单线程模式;两个Map对象地址也不同;最后map_index均从1,2,3计数,并未达到预期的4,5,6。所以,得出结论:MapReduce任务是多进程单线程模式的。
二、扩展阅读