前言
在前面的几章内, 我们讲解了如何进行简单的WordCount操作、如何自定义业务类型、如何Join操作等. 本章, 作为这个系列的补充章节, 稍微将下之前未被介绍的内容: 计数器
和程序运行组
.
本文相关代码, 可在我的Github项目 https://github.com/SeanYanxml/bigdata/ 目录下可以找到. PS: (如果觉得项目不错, 可以给我一个Star.)
正文
计数器
业务背景, 假设统计文本行数, 或者某个方法的运行次数. 来一条就记一次数.
public class CounterDemo {
static class CounterMapper extends Mapper<LongWritable, Text, IntWritable, Text>{
@Override
protected void map(LongWritable key, Text value,Mapper<LongWritable, Text, IntWritable, Text>.Context context)
throws IOException, InterruptedException {
// 使用一个计数器来计算 最后的统计次数.<组号,计数器名称>
Counter counter = context.getCounter("hello","hello");
// 设计每次都加1
counter.increment(1);
}
}
}
运行组
在Java内有线程池, 有ThreadGroup
的概念. MR
中也应当有这些来管理Job
以及Job
之前的先后顺序和依赖关系.
相关的代码如下所示:
/**
* 多个Job串联运行.
*
* */
public class MoreJobsDemo {
public static void main(String[] args) throws IOException, InterruptedException {
Configuration conf = new Configuration();
conf.set("mapreduce.framework.name", "yarn");
// conf.set("mapreduce.framework.name", "local");
conf.set("yarn.resourcemanager.hostname", "localhost");
conf.set("fs.defaultFS", "hdfs://localhost:9000/");
Job job1 = Job.getInstance(conf);
Job job2 = Job.getInstance(conf);
Job job3 = Job.getInstance(conf);
ControlledJob controlJob1 = new ControlledJob(conf);
ControlledJob controlJob2 = new ControlledJob(conf);
ControlledJob controlJob3 = new ControlledJob(conf);
// 装载
controlJob1.setJob(job1);
controlJob2.setJob(job2);
controlJob3.setJob(job3);
// 设置先后顺序
controlJob2.addDependingJob(controlJob1);
controlJob3.addDependingJob(controlJob2);
//设置控制器
JobControl jobControl = new JobControl("[Sean-JobControl-2019]");
jobControl.addJob(controlJob1);
jobControl.addJob(controlJob2);
jobControl.addJob(controlJob3);
Thread jobControlThread = new Thread(jobControl);
jobControlThread.start();
while(jobControl.allFinished()){
Thread.sleep(2000);
}
}
}